MIME-Version: 1.0
References: <CAEzWdqeGj9FcubNXegJ8PGTnXNahUhgc6T+yNFW7O12EkKR9yA@mail.gmail.com>
 <818d0359d8b629a80b55b2e068dab958fc8e0a2a.camel@cybertec.at>
In-Reply-To: <818d0359d8b629a80b55b2e068dab958fc8e0a2a.camel@cybertec.at>
From: yudhi s <learnerdatabase99@gmail.com>
Date: Mon, 16 Feb 2026 14:43:03 +0530
Message-ID: <CAEzWdqfxtEzxO10Rnr0Yw+tPJMtCuu2c2e1mr6bEzuYL1U1BvA@mail.gmail.com>
Subject: Re: Question on execution plan and suitable index
To: Laurenz Albe <laurenz.albe@cybertec.at>, Ron Johnson <ronljohnsonjr@gmail.com>, 
	Adrian Klaver <adrian.klaver@aklaver.com>, Nisarg Patel <er.nisarg@gmail.com>
Cc: pgsql-general <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000afae00064aed5d98"
Archived-At: <https://www.postgresql.org/message-id/CAEzWdqfxtEzxO10Rnr0Yw%2BtPJMtCuu2c2e1mr6bEzuYL1U1BvA%40mail.gmail.com>
Precedence: bulk

--000000000000afae00064aed5d98
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, Feb 16, 2026 at 2:29=E2=80=AFPM Laurenz Albe <laurenz.albe@cybertec=
.at>
wrote:

> On Mon, 2026-02-16 at 00:34 +0530, yudhi s wrote:
> > It's postgres version 17. We are having a critical UI query which runs
> for ~7 seconds+. The requirement is to bring down the response time withi=
n
> ~1 sec. Now in this plan , If i read this correctly, the below section is
> consuming a significant amount of resources and should be addressed. i.e.
> "Full scan of table "orders" and Nested loop with event_audit_log table".
> >
> > Below is the query and its complete plan:-
> >
> https://gist.github.com/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9
> >
> > I am a bit new to the indexing strategy in postgres. My question is,
> what suitable index should we create to cater these above?
> >
> > 1)For table event_audit_log:- Should we create composite Index on colum=
n
> (request_id,created_at,event_comment_text) or should we create the coveri=
ng
> index i.e. just on two column (request_id,created_at) with "include" clau=
se
> for "event_comment_text". How and when the covering index indexes should =
be
> used here in postgres. Want to understand from experts?
> > 2)Similarly for table orders:- Should we create a covering index on
> column (entity_id,due_date,order_type) with include clause
> (firm_dspt_case_id). Or just a composite index
> (entity_id,due_date,order_type).
> > 3)Whether the column used as range operator (here created_at or
> due_date) should be used as leading column in the composite index or is i=
t
> fine to keep it as non leading?
> >
> > ->
> *Nested Loop  (cost=3D50.06..2791551.71 rows=3D3148 width=3D19) (actual
> time=3D280.735..7065.313 rows=3D57943 loops=3D3) >   Buffers: shared hit=
=3D10014901*
> >   ->  Hash Join  (cost=3D49.49..1033247.35 rows=3D36729 width=3D8) (act=
ual
> time=3D196.407..3805.755 rows=3D278131 loops=3D3)
> >  Hash Cond: ((ord.entity_id)::numeric =3D e.entity_id)
> >  Buffers: shared hit=3D755352
> >  ->  Parallel Seq Scan on orders ord  (cost=3D0.00..1022872.54
> rows=3D3672860 width=3D16) (actual time=3D139.883..3152.627 rows=3D294467=
1 loops=3D3)
> >   Filter: ((due_date >=3D '2024-01-01'::date) AND (due_date <=3D
> '2024-04-01'::date) AND (order_type =3D ANY ('{TYPE_A,TYPE_B}'::text[])))
> >   Rows Removed by Filter: 6572678
> >   Buffers: shared hit=3D755208
>
> You are selecting a lot of rows, so the query will never be really cheap.
> But I agree that an index scan should be a win.
>
> If the condition on "order_type" is always the same, a partial index is
> ideal:
>
>    CREATE INDEX ON orders (due_date) WHERE order_type IN ('TYPE_A',
> 'TYPE_B');
>
> Otherwise, I'd create two indexes: one on "order_type" and one on
> "due_date".
>
>
>
Version is 17.7. Below is the table definitions as i pulled from Dbeaver
tool:-

https://gist.github.com/databasetech0073/f22d95de18dc3f1fa54af13e7fd2ce9e

The Order_type will be TYPE_A and TYPE_B in most of the cases. And below is
the distribution. So , it looks like the index on this column will not help
much. Correct me if I'm wrong. I am wondering why the already existing
index on column "due_date" of table "order" is not getting used by the
optimizer? Should we also add the column "entity_id" to the index too?

TYPE_A  25 Million
TYPE_B  2 Million
TYPE_C  700K
TYPE_D  200K
TYPE_E  6k

And, Yes there are differences in data types of the "entity_id" for columns
of table "order" and "entity". We need to fix that after analyzing the data=
.

Also the highlighted Nested loop above shows ~10M shared hits (which will
be ~70GB+ if we consider one hit as an 8K block). So does that mean , apart
from the Full scan on the "order" table , the main resource consuming
factor here is the scanning of "event_audit_log". And what is the best way
to improve this? Currently this table is getting scanned through an unique
index on column "request_id".

Regards
Yudhi

--000000000000afae00064aed5d98
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><br><div class=3D"gmail_quote g=
mail_quote_container"><div dir=3D"ltr" class=3D"gmail_attr">On Mon, Feb 16,=
 2026 at 2:29=E2=80=AFPM Laurenz Albe &lt;<a href=3D"mailto:laurenz.albe@cy=
bertec.at">laurenz.albe@cybertec.at</a>&gt; wrote:<br></div><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid =
rgb(204,204,204);padding-left:1ex">On Mon, 2026-02-16 at 00:34 +0530, yudhi=
 s wrote:<br>
&gt; It&#39;s postgres version 17. We are having a critical UI query which =
runs for ~7 seconds+. The requirement is to bring down the response time wi=
thin ~1 sec. Now in this plan , If i read this correctly, the below section=
 is consuming a significant amount of resources and should be addressed. i.=
e. &quot;Full scan of table &quot;orders&quot; and Nested loop with event_a=
udit_log table&quot;.<br>
&gt; <br>
&gt; Below is the query and its complete plan:-=C2=A0<br>
&gt; <a href=3D"https://gist.github.com/databasetech0073/f564ac23ee35d1f041=
3980fe4d00efa9" rel=3D"noreferrer" target=3D"_blank">https://gist.github.co=
m/databasetech0073/f564ac23ee35d1f0413980fe4d00efa9</a><br>
&gt; <br>
&gt; I am a bit new to the indexing strategy in postgres. My question is, w=
hat suitable index should we create to cater these above?<br>
&gt; <br>
&gt; 1)For table event_audit_log:- Should we create composite Index on colu=
mn (request_id,created_at,event_comment_text) or should we create the cover=
ing index i.e. just on two column (request_id,created_at) with &quot;includ=
e&quot; clause for &quot;event_comment_text&quot;. How and when the coverin=
g index indexes should be used here in postgres. Want to understand from ex=
perts?=C2=A0<br>
&gt; 2)Similarly for table orders:- Should we create a covering index on co=
lumn (entity_id,due_date,order_type) with include clause (firm_dspt_case_id=
). Or just a composite index (entity_id,due_date,order_type).<br>
&gt; 3)Whether the column used as range operator (here created_at or due_da=
te) should be used as leading column in the composite index or is it fine t=
o keep it as non leading?<br>
&gt; <br>
&gt; -&gt; =C2=A0<b>Nested Loop =C2=A0(cost=3D50.06..2791551.71 rows=3D3148=
 width=3D19) (actual time=3D280.735..7065.313 rows=3D57943 loops=3D3)<br>
&gt; =C2=A0=C2=A0Buffers: shared hit=3D10014901</b><br>
&gt; =C2=A0=C2=A0-&gt; =C2=A0Hash Join =C2=A0(cost=3D49.49..1033247.35 rows=
=3D36729 width=3D8) (actual time=3D196.407..3805.755 rows=3D278131 loops=3D=
3)<br>
&gt; =C2=A0Hash Cond: ((ord.entity_id)::numeric =3D e.entity_id)<br>
&gt; =C2=A0Buffers: shared hit=3D755352<br>
&gt; =C2=A0-&gt; =C2=A0Parallel Seq Scan on orders ord =C2=A0(cost=3D0.00..=
1022872.54 rows=3D3672860 width=3D16) (actual time=3D139.883..3152.627 rows=
=3D2944671 loops=3D3)<br>
&gt; =C2=A0=C2=A0Filter: ((due_date &gt;=3D &#39;2024-01-01&#39;::date) AND=
 (due_date &lt;=3D &#39;2024-04-01&#39;::date) AND (order_type =3D ANY (=
9;{TYPE_A,TYPE_B}&#39;::text[])))<br>
&gt; =C2=A0=C2=A0Rows Removed by Filter: 6572678<br>
&gt; =C2=A0=C2=A0Buffers: shared hit=3D755208<br>
<br>
You are selecting a lot of rows, so the query will never be really cheap.<b=
r>
But I agree that an index scan should be a win.<br>
<br>
If the condition on &quot;order_type&quot; is always the same, a partial in=
dex is ideal:<br>
<br>
=C2=A0 =C2=A0CREATE INDEX ON orders (due_date) WHERE order_type IN (&#39;TY=
PE_A&#39;, &#39;TYPE_B&#39;);<br>
<br>
Otherwise, I&#39;d create two indexes: one on &quot;order_type&quot; and on=
e on &quot;due_date&quot;.<br>
<br><br></blockquote><div><br></div><div>Version is 17.7. Below is the tabl=
e definitions as i pulled from Dbeaver tool:-</div><div><br></div><div><a h=
ref=3D"https://gist.github.com/databasetech0073/f22d95de18dc3f1fa54af13e7fd=
2ce9e">https://gist.github.com/databasetech0073/f22d95de18dc3f1fa54af13e7fd=
2ce9e</a>=C2=A0</div><div><br></div><div>The Order_type will be TYPE_A and =
TYPE_B in most of the cases. And below is the distribution. So , it looks l=
ike the index on this column will not help much. Correct me if I&#39;m wron=
g. I am wondering why the already=C2=A0existing index on column &quot;due_d=
ate&quot; of table &quot;order&quot; is not getting used by the optimizer? =
Should we also add the column &quot;entity_id&quot; to the index too?</div>=
<div><br></div><div>TYPE_A=C2=A0 25 Million</div><div>TYPE_B=C2=A0 2 Millio=
n</div><div>TYPE_C=C2=A0 700K</div><div>TYPE_D=C2=A0 200K</div><div>TYPE_E=
=C2=A0 6k</div><div><br></div><div>And, Yes there are differences in data t=
ypes of the &quot;entity_id&quot; for columns of table &quot;order&quot; an=
d &quot;entity&quot;. We need to fix that after analyzing the data.</div><d=
iv><br></div><div>Also the highlighted Nested loop above shows ~10M shared =
hits (which will be ~70GB+ if we consider one hit as an 8K block). So does =
that mean , apart from the Full scan on the &quot;order&quot; table , the m=
ain resource consuming factor here is the scanning of &quot;event_audit_log=
&quot;. And what is the best=C2=A0way to improve this? Currently this table=
 is getting scanned through an unique index on column &quot;request_id&quot=
;.</div><div><br></div><div>Regards</div><div>Yudhi</div><div><br></div><di=
v>=C2=A0</div></div></div>

--000000000000afae00064aed5d98--