Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
From: Manikandan Swaminathan <maniswami23@gmail.com>
Mime-Version: 1.0 (1.0)
Subject: Re: Postgres Query Plan using wrong index
Message-Id: <C91D3FE2-7ABC-4377-B7F1-A738930D5304@gmail.com>
Date: Wed, 2 Apr 2025 16:44:32 -0700
Cc: pgsql-general@lists.postgresql.org
To: Tom Lane <tgl@sss.pgh.pa.us>
Archived-At: <https://www.postgresql.org/message-id/C91D3FE2-7ABC-4377-B7F1-A738930D5304%40gmail.com>
Precedence: bulk

=EF=BB=BFThanks so much for your help, Tom.

Sorry, I didn=E2=80=99t quite understand the answer =E2=80=94 I have a few f=
ollow-up questions.  Sorry, I'm new to Postgres so I am a bit ignorant here a=
nd would appreciate any tips on the query planner you could give.

1) Why is the query currently picking the poorly performing index? I already=
 have an index on (col_a, col_b) that performs well. When I remove the separ=
ate index on (col_b), it correctly uses the (col_a, col_b) index and the que=
ry runs efficiently. But when both indexes are present, it chooses the slowe=
r (col_b) index instead.

2) Why would the index you suggested, (col_b, col_a), perform better than (c=
ol_a, col_b)? I would=E2=80=99ve expected the filter on col_a to come first,=
 followed by the aggregate on col_b. In my mind, it needs to find rows match=
ing the col_a condition before calculating the MIN(col_b), and I assumed it w=
ould traverse the B-tree accordingly.  I'm more used to MySQL where I think i=
t is called a "lose index scan".  I must have a gap in my understanding of h=
ow Postgres approaches this.  Thanks for your help!

3) Why does the planner choose the better-performing (col_a, col_b) index wh=
en the filter is col_a > 5000, but switch to the slower (col_b) index when t=
he filter is not at the edge of the range, like col_a > 4996? For reference,=
 here=E2=80=99s the query plan when filtering for col_a > 5000. It uses the c=
orrect index on (col_a, col_b).

postgres=3D# explain analyze select min(col_b) from test_table  where col_a >=
 5000;
                                                          =20
 Aggregate  (cost=3D4.46..4.46 rows=3D1 width=3D4) (actual time=3D0.008..0.0=
08 rows=3D1 loops=3D1)
   ->  Index Only Scan using idx_col_b_a on test_table  (cost=3D0.43..4.45 r=
ows=3D1 width=3D4) (actual time=3D0.004..0.005 rows=3D0 loops=3D1)
         Index Cond: (col_a > 5000)
         Heap Fetches: 0
 Planning Time: 2.279 ms
 Execution Time: 0.028 ms
(6 rows)


>=20
> On Apr 1, 2025, at 5:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> =EF=BB=BFManikandan Swaminathan <maniswami23@gmail.com> writes:
>> 4. When running the following query, I would expect the index "idx_col_b_=
a"
>> to be used: select min(col_b) from test_table  where col_a > 4996.
>> I have a range-based filter on col_a, and am aggregating the result with
>> min(col_b). Both columns are covered by "idx_col_b_a".
>=20
> They may be covered, but sort order matters, and that index has the
> wrong sort order to help with this query.  Try
>=20
> create index on test_table(col_b, col_a);
>=20
>          regards, tom lane