MIME-Version: 1.0
References: <CAB+=1TX+Av1Fx+Q4YOmUGioUoa8TQ8kGa1h06zPSEona2az39A@mail.gmail.com>
 <CAKna9VajLFW=9Z1Y9ar0WJXKeGTgYXivFtBmdt=gXJoLs4s2Rw@mail.gmail.com>
In-Reply-To: <CAKna9VajLFW=9Z1Y9ar0WJXKeGTgYXivFtBmdt=gXJoLs4s2Rw@mail.gmail.com>
From: veem v <veema0000@gmail.com>
Date: Sun, 9 Jun 2024 10:21:55 +0530
Message-ID: <CAB+=1TUghHyWXDhEqeWhzWRgWJPy44pr7VkhX+v_-nCph6GWgA@mail.gmail.com>
Subject: Re: How to create efficient index in this scenario?
To: Lok P <loknath.73@gmail.com>
Cc: pgsql-general <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000009e8f6f061a6dcbc5"
Archived-At: <https://www.postgresql.org/message-id/CAB%2B%3D1TUghHyWXDhEqeWhzWRgWJPy44pr7VkhX%2Bv_-nCph6GWgA%40mail.gmail.com>
Precedence: bulk

--0000000000009e8f6f061a6dcbc5
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sun, 9 Jun 2024 at 09:45, Lok P <loknath.73@gmail.com> wrote:

>
> On Sat, Jun 8, 2024 at 7:03=E2=80=AFPM veem v <veema0000@gmail.com> wrote=
:
>
>>
>> There is a blog below (which is for oracle), showing how the index shoul=
d
>> be chosen and it states ,  "*Stick the columns you do range scans on
>> last in the index, filters that get equality predicates should come firs=
t.*
>> ", and in that case we should have the PK created as in the order
>> (transaction_id,transaction_timestamp). It's because making the range
>> predicate as a leading column won't help use that as an access predicate
>> but as an filter predicate thus will read more blocks and thus more IO.
>> Does this hold true in postgres too?
>>
>>
>> https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-indexi=
ng/
>>
>
> I believe the analogy holds true here in postgres too and the index in
> this case should be on (transaction_id, transaction_timestamp).
>
>
>>
>>
>> Additionally there is another scenario in which we have the requirement
>> to have another timestamp column (say create_timestamp) to be added as p=
art
>> of the primary key along with transaction_id and we are going to query t=
his
>> table frequently by the column create_timestamp as a range predicate. An=
d
>> ofcourse we will also have the range predicate filter on partition key
>> "transaction_timestamp". But we may or may not have join/filter on colum=
n
>> transaction_id, so in this scenario we should go for
>>  (create_timestamp,transaction_id,transaction_timestamp). because
>> "transaction_timestamp" is set as partition key , so putting it last
>> doesn't harm us. Will this be the correct order or any other index order=
 is
>> appropriate?
>>
>>
>>
> In this case , the index should be on (
> create_timestamp,transaction_id,transaction_timestamp), considering the
> fact that you will always have queries with "create_timestamp" as predica=
te
> and may not have transaction_id in the query predicate.
>

So in the second scenario, if we keep the create_timestamp as the leading
column ,is it not against the advice which the blog provides i.e. to not
have the range predicate as the leading column in the index?

--0000000000009e8f6f061a6dcbc5
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><div class=3D"gmail_quote"><div=
 dir=3D"ltr" class=3D"gmail_attr">On Sun, 9 Jun 2024 at 09:45, Lok P &lt;<a=
 href=3D"mailto:loknath.73@gmail.com">loknath.73@gmail.com</a>&gt; wrote:<b=
r></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex=
;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">=
<div dir=3D"ltr"><br></div><div class=3D"gmail_quote"><div dir=3D"ltr" clas=
s=3D"gmail_attr">On Sat, Jun 8, 2024 at 7:03=E2=80=AFPM veem v &lt;<a href=
=3D"mailto:veema0000@gmail.com" target=3D"_blank">veema0000@gmail.com</a>&g=
t; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0p=
x 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div d=
ir=3D"ltr"><div><br>There is a blog below (which is for oracle), showing ho=
w the index should be chosen and it states , =C2=A0&quot;<i>Stick the colum=
ns you do range scans on last in the index, filters that get equality predi=
cates should come first.</i> &quot;, and in that case we should have the PK=
 created as in the order (transaction_id,transaction_timestamp). It&#39;s b=
ecause making the range predicate as a leading column won&#39;t help use th=
at as an access predicate but as an filter predicate thus will read more bl=
ocks and thus more IO. Does this hold true in postgres too?<br><br><a href=
=3D"https://ctandrewsayer.wordpress.com/2017/03/24/the-golden-rule-of-index=
ing/" target=3D"_blank">https://ctandrewsayer.wordpress.com/2017/03/24/the-=
golden-rule-of-indexing/</a></div></div></blockquote><div><br></div><div>I =
believe=C2=A0the analogy holds true here in postgres too and the index in t=
his case should be on (transaction_id, transaction_timestamp).<br></div><di=
v>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px=
 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D=
"ltr"><div><br><br>Additionally there is another scenario in which we have =
the requirement to have another timestamp column (say create_timestamp) to =
be added as part of the primary key along with transaction_id and we are go=
ing to query this table frequently by the column create_timestamp as a rang=
e predicate. And ofcourse we will also have the range predicate filter on p=
artition key &quot;transaction_timestamp&quot;. But we may or may not have =
join/filter on column transaction_id, so in this scenario we should go for =
=C2=A0(create_timestamp,transaction_id,transaction_timestamp). because &quo=
t;transaction_timestamp&quot; is set as partition key , so putting it last =
doesn&#39;t harm us. Will this be the correct order or any other index orde=
r is appropriate?<br></div><div><br></div><div><br></div></div></blockquote=
><div><br></div><div>In this case , the index should be on ( create_timesta=
mp,transaction_id,transaction_timestamp), considering the fact that you wil=
l always=C2=A0have queries with &quot;create_timestamp&quot; as predicate a=
nd may not have transaction_id in the query predicate.</div></div></div></b=
lockquote><div><br></div><div>So in the second scenario, if we keep the cre=
ate_timestamp as the leading column ,is it not against the advice which the=
 blog provides i.e. to not have the range predicate as the leading column i=
n the index?</div></div></div>

--0000000000009e8f6f061a6dcbc5--