MIME-Version: 1.0
From: Manikandan Swaminathan <maniswami23@gmail.com>
Date: Tue, 1 Apr 2025 17:12:04 -0700
Message-ID: <CAP4RKL8yqrG42oQKFSF4HH3Rpm_cHz4vaaCNTDL--TSLkYNngg@mail.gmail.com>
Subject: Postgres Query Plan using wrong index
To: pgsql-general@lists.postgresql.org
Content-Type: multipart/alternative; boundary="00000000000092dd860631c08132"
Archived-At: <https://www.postgresql.org/message-id/CAP4RKL8yqrG42oQKFSF4HH3Rpm_cHz4vaaCNTDL--TSLkYNngg%40mail.gmail.com>
Precedence: bulk

--00000000000092dd860631c08132
Content-Type: text/plain; charset="UTF-8"

Hello,

I'm running on the docker postgres:17.0 image and trying to test out the
behavior of adding a new index to a table. Specifically, I wanted to verify
that my new index is actually used by looking at the output of "EXPLAIN
ANALYZE". However, I found that my index is often not being used and wanted
to see the rationale of the query planner when choosing the index.

Reproduction steps
postgres=# select version();
                                                          version

---------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 17.0 (Debian 17.0-1.pgdg120+1) on aarch64-unknown-linux-gnu,
compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
(1 row)

1. Create database

CREATE DATABASE test LOCALE_PROVIDER icu ICU_LOCALE "en-US-x-icu" LOCALE
"en_US.utf8" TEMPLATE template0;

2. Create table and indices
CREATE TABLE test_table (
    col_a int,
    col_b INT NOT NULL
);
CREATE INDEX IF NOT EXISTS idx_col_a_btree ON test_table(col_b);
CREATE INDEX IF NOT EXISTS idx_col_a_brin ON test_table USING brin (col_b);
CREATE INDEX IF NOT EXISTS idx_col_b_a ON test_table(col_a, col_b);

3. Load 10 million rows into table

DO $$
DECLARE
    batch_count INT := 0;
    b_var INT := 0;
    a_var INT := 1;
    prev_a INT := 1;
    a_null BOOLEAN := FALSE;
    batch_size INT := 1000;
BEGIN
    FOR i IN 1..10000000 LOOP
        IF batch_count = batch_size THEN
            b_var := b_var + 1;
            a_null := NOT a_null;
            IF NOT a_null THEN
                a_var := prev_a + 1;
            ELSE
                prev_a := a_var;
                a_var := NULL;
            END IF;
            batch_count := 0;
        END IF;
        INSERT INTO test_table (col_a, col_b) VALUES (a_var, b_var);
        batch_count := batch_count + 1;
    END LOOP;
END $$;

4. When running the following query, I would expect the index "idx_col_b_a"
to be used: select min(col_b) from test_table  where col_a > 4996.
I have a range-based filter on col_a, and am aggregating the result with
min(col_b). Both columns are covered by "idx_col_b_a". However, explain
analyze indicates otherwise:

postgres=# explain analyze select min(col_b) from test_table  where col_a >
4996;
                                                                      QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=63.86..63.87 rows=1 width=4) (actual time=587.550..587.550
rows=1 loops=1)
   InitPlan 1
     ->  Limit  (cost=0.43..63.86 rows=1 width=4) (actual
time=587.542..587.543 rows=1 loops=1)
           ->  Index Scan using idx_col_a_btree on test_table
 (cost=0.43..259400.27 rows=4090 width=4) (actual time=587.541..587.541
rows=1 loops=1)
                 Filter: (col_a > 4996)
                 Rows Removed by Filter: 9992000
 Planning Time: 0.305 ms
 Execution Time: 587.579 ms
(8 rows)

Instead of using idx_col_b_a, it does an index scan on idx_col_a_btree.
This is a problem because of the way how data is structured in my table.
The higher col_a values are associated with higher col_b values. As a
result, the index scan ends up having to scan through most of the index
before finding the first record that matches the critieria "col_a > 4996".

When I DROP the idx_col_a_btree index, the resulting query plan looks much
better because it's using the correct index on col_b:
postgres=# explain analyze select min(col_b) from test_table  where col_a >
4996;
                                                                QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=102.23..102.24 rows=1 width=4) (actual time=0.591..0.592
rows=1 loops=1)
   ->  Index Only Scan using idx_col_b_a on test_table  (cost=0.43..92.01
rows=4090 width=4) (actual time=0.021..0.341 rows=4000 loops=1)
         Index Cond: (col_a > 4996)
         Heap Fetches: 0
 Planning Time: 0.283 ms
 Execution Time: 0.613 ms
(6 rows)

I tried fiddling with the table statistics and the random_page_cost but
neither seemed to make a difference. Is there some nuance here that I'm
missing? Why is the query planner using an index that drastically worsens
the performance of the query?

--00000000000092dd860631c08132
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hello,<br><br>I&#39;m running on the docker postgres:17.0 =
image and trying to test out the behavior of adding a new index to a table.=
 Specifically, I wanted to verify that my new index is actually used by loo=
king at the output of &quot;EXPLAIN ANALYZE&quot;. However, I found that my=
 index is often not being used and wanted to see the rationale of the query=
 planner when choosing the index.<br><br>Reproduction steps<br>postgres=3D#=
 select version();<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 version =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<br>-----=
---------------------------------------------------------------------------=
-------------------------------------------<br>=C2=A0PostgreSQL 17.0 (Debia=
n 17.0-1.pgdg120+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 1=
2.2.0-14) 12.2.0, 64-bit<br>(1 row)<br><br>1. Create database<br><br>CREATE=
 DATABASE test LOCALE_PROVIDER icu ICU_LOCALE &quot;en-US-x-icu&quot; LOCAL=
E &quot;en_US.utf8&quot; TEMPLATE template0;<br><br>2. Create table and ind=
ices<br>CREATE TABLE test_table (<br>=C2=A0 =C2=A0 col_a int,<br>=C2=A0 =C2=
=A0 col_b INT NOT NULL =C2=A0 =C2=A0<br>);<br>CREATE INDEX IF NOT EXISTS id=
x_col_a_btree ON test_table(col_b);<br>CREATE INDEX IF NOT EXISTS idx_col_a=
_brin ON test_table USING brin (col_b);<br>CREATE INDEX IF NOT EXISTS idx_c=
ol_b_a ON test_table(col_a, col_b);<br><br>3. Load 10 million rows into tab=
le<br><br>DO $$<br>DECLARE<br>=C2=A0 =C2=A0 batch_count INT :=3D 0;<br>=C2=
=A0 =C2=A0 b_var INT :=3D 0;<br>=C2=A0 =C2=A0 a_var INT :=3D 1;<br>=C2=A0 =
=C2=A0 prev_a INT :=3D 1;<br>=C2=A0 =C2=A0 a_null BOOLEAN :=3D FALSE;<br>=
=C2=A0 =C2=A0 batch_size INT :=3D 1000;<br>BEGIN<br>=C2=A0 =C2=A0 FOR i IN =
1..10000000 LOOP<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 IF batch_count =3D batch_si=
ze THEN<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 b_var :=3D b_var + 1;<=
br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 a_null :=3D NOT a_null;<br>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 IF NOT a_null THEN<br>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 a_var :=3D prev_a + 1;<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ELSE<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 prev_a :=3D a_var;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 a_var :=3D NULL;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 END IF;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 batch_c=
ount :=3D 0;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 END IF;<br>=C2=A0 =C2=A0 =C2=A0=
 =C2=A0 INSERT INTO test_table (col_a, col_b) VALUES (a_var, b_var);<br>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 batch_count :=3D batch_count + 1;<br>=C2=A0 =C2=A0=
 END LOOP;<br>END $$;<br><br>4. When running the following query, I would e=
xpect the index &quot;idx_col_b_a&quot; to be used: select min(col_b) from =
test_table =C2=A0where col_a &gt; 4996. <br>I have a range-based filter on =
col_a, and am aggregating the result with min(col_b). Both columns are cove=
red by &quot;idx_col_b_a&quot;. However, explain analyze indicates otherwis=
e:<br><br>postgres=3D# explain analyze select min(col_b) from test_table =
=C2=A0where col_a &gt; 4996;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QUERY PLAN =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0<br>------------------------------------------------------=
---------------------------------------------------------------------------=
---------------------<br>=C2=A0Result =C2=A0(cost=3D63.86..63.87 rows=3D1 w=
idth=3D4) (actual time=3D587.550..587.550 rows=3D1 loops=3D1)<br>=C2=A0 =C2=
=A0InitPlan 1<br>=C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0Limit =C2=A0(cost=3D0.43..=
63.86 rows=3D1 width=3D4) (actual time=3D587.542..587.543 rows=3D1 loops=3D=
1)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-&gt; =C2=A0Index Scan using=
 idx_col_a_btree on test_table =C2=A0(cost=3D0.43..259400.27 rows=3D4090 wi=
dth=3D4) (actual time=3D587.541..587.541 rows=3D1 loops=3D1)<br>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Filter: (col_a &gt; 499=
6)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Rows Re=
moved by Filter: 9992000<br>=C2=A0Planning Time: 0.305 ms<br>=C2=A0Executio=
n Time: 587.579 ms<br>(8 rows)<br><br>Instead of using idx_col_b_a, it does=
 an index scan on idx_col_a_btree. This is a problem because of the way how=
 data is structured in my table. The higher col_a values are associated wit=
h higher col_b values. As a result, the index scan ends up having to scan t=
hrough most of the index before finding the first record that matches the c=
ritieria &quot;col_a &gt; 4996&quot;.<br><br>When I DROP the idx_col_a_btre=
e index, the resulting query plan looks much better because it&#39;s using =
the correct index on col_b:<br>postgres=3D# explain analyze select min(col_=
b) from test_table =C2=A0where col_a &gt; 4996;<br>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 QUERY PLAN =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0<br>-----=
---------------------------------------------------------------------------=
----------------------------------------------------------<br>=C2=A0Aggrega=
te =C2=A0(cost=3D102.23..102.24 rows=3D1 width=3D4) (actual time=3D0.591..0=
.592 rows=3D1 loops=3D1)<br>=C2=A0 =C2=A0-&gt; =C2=A0Index Only Scan using =
idx_col_b_a on test_table =C2=A0(cost=3D0.43..92.01 rows=3D4090 width=3D4) =
(actual time=3D0.021..0.341 rows=3D4000 loops=3D1)<br>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0Index Cond: (col_a &gt; 4996)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0Heap Fetches: 0<br>=C2=A0Planning Time: 0.283 ms<br>=C2=A0Execution T=
ime: 0.613 ms<br>(6 rows)<br><br>I tried fiddling with the table statistics=
 and the random_page_cost but neither seemed to make a difference. Is there=
 some nuance here that I&#39;m missing? Why is the query planner using an i=
ndex that drastically worsens the performance of the query?<br><br><br></di=
v>

--00000000000092dd860631c08132--