Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u1oV7-008W0V-0d for pgsql-general@arkaria.postgresql.org; Mon, 07 Apr 2025 15:35:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1u1oV5-000Ipp-9O for pgsql-general@arkaria.postgresql.org; Mon, 07 Apr 2025 15:35:03 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u1oV4-000Ipg-JF for pgsql-general@lists.postgresql.org; Mon, 07 Apr 2025 15:35:03 +0000 Received: from mail-ua1-x929.google.com ([2607:f8b0:4864:20::929]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1u1oV2-003Umr-1F for pgsql-general@lists.postgresql.org; Mon, 07 Apr 2025 15:35:01 +0000 Received: by mail-ua1-x929.google.com with SMTP id a1e0cc1a2514c-86b9d9b02cbso1899931241.1 for ; Mon, 07 Apr 2025 08:35:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dbtune.com; s=google; t=1744040099; x=1744644899; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sOXvsm6ADOoZn0UlqZCs6oPMEgmRm8QHjhnYJeNqgeU=; b=FR8+Vh52RdPVggQjtmnZ1KeYgI6M0qvlLWQ+tnz7Xw+bcOtjqJFHScwHTqcPUZh7w4 VxUcBD3HQ0lzfr1NrZP8ataaq+IIgg3jJfVNpSYNe068DZYaZgWh7SW56ZickspfUFLx byI8+yK7lSRsWguHtd29+dMk0kmiOhQyyyYGZ6UMiekKSOAR+fr/a8qCLO6k7UpU60Ou AlO8RcgpWjx4agxNSc/dnjy2fh23CAv3ihuMDwXS13v+/hVIY0nwRsfu4Qqqa6eRReMn 5cCKPHiSPsdM6Kpeg75daZQsiwMcbsSIJKlKWd8NvBHnvVFtLFidVcN7OEq0ihW/63Od pM7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1744040099; x=1744644899; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sOXvsm6ADOoZn0UlqZCs6oPMEgmRm8QHjhnYJeNqgeU=; b=FidxCVwPX5zS++W8EDXbh4fweBPbplanMsTWzLtD1h8GJIo8WZ+OryQYoQjvTvS4+X +W6eG4uZtZnyFV0vxSvbqU1IUpoA+Dd0IHA97Knl8hg5HD+8UU21r+l/SWl0gmTxlVCG v8iqRVQI/KPX/ji3X6YeQwYtli4AeWotzbr6LgPOaoEmxWu2Q8k7PYEm0KTLssoH3zQI BkMt3RRFY8ZqkqYKB065rKyMcfs9UB7Ba46qIw2Kd+I+pddHMlNcWS/B9DOa59sYLLYu c1eRobaZs7PYj5uc6YWA2uH1ERSt3TpCGx0sx8dzjiLJIlYmQNqtgxHsQc49hAnkZHSD nX8w== X-Gm-Message-State: AOJu0Yw3KXhs31z2s1xJp9+gUL/p9AihWpyHt44QIEiv3z/BqkHNSPJM 4dG38WAAcuu8RRJeryBFE7i/13b7wIfcKExCGND33PZ20N/ynmxdjx2/cGEdEpEp2c7Amo8EjGE dVKn4urMZSP6hw6txdCFCBkYIMw+FY3FIpXhy9A== X-Gm-Gg: ASbGnctY3PIN4z5rBooxa8/wmhJT9++1N9Bo3ZS8auStzIGUaDtPoAckJSkWaLZfXNa 5MvUpvMLt7U1Lh8uj2NHHjt/ub2rdTH+F44qiIkdG2+j1ngPm4L3GppihB1Ld2zV/q7LvKPsyqK VhmZ7J/ukP/1VNCmTym2gBDqJLyACK X-Google-Smtp-Source: AGHT+IFFW3Jta79cqtSi6lszAkrDmIQuh8xyKEH93S/iPkyMubsdXbjnSsvMchezQOry9YBZSQLv7/jtfhBahnJzv4A= X-Received: by 2002:a05:6122:1349:b0:520:5f0a:b5a5 with SMTP id 71dfb90a1353d-5276452c8b9mr10029268e0c.6.1744040099155; Mon, 07 Apr 2025 08:34:59 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Costa Alexoglou Date: Mon, 7 Apr 2025 17:34:47 +0200 X-Gm-Features: ATxdqUEUkkwGmiK2uRnuIFHc49dlpfSyCcvR9Nogg_9EdCUG5ggL-a2133CS_Ps Message-ID: Subject: Re: Performance regression when adding LIMIT 1 to a query To: Artur Zakirov Cc: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="00000000000095b1f2063231fa97" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000095b1f2063231fa97 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Apr 7, 2025 at 4:06=E2=80=AFPM Artur Zakirov wr= ote: > On Mon, 7 Apr 2025 at 14:45, Costa Alexoglou wrote: > > ... > > with a plan: > > ``` > > Gather Merge (cost=3D115584.47..118515.35 rows=3D25120 width=3D824) (a= ctual > time=3D46.004..74.267 rows=3D29653 loops=3D1) > > Workers Planned: 2 > > Workers Launched: 2 > > -> Sort (cost=3D114584.45..114615.85 rows=3D12560 width=3D824) (act= ual > time=3D41.200..47.322 rows=3D9884 loops=3D3) > > Sort Key: id > > Sort Method: external merge Disk: 16360kB > > Worker 0: Sort Method: external merge Disk: 15552kB > > Worker 1: Sort Method: external merge Disk: 14536kB > > -> Parallel Bitmap Heap Scan on databases_metrics > (cost=3D990.77..109175.83 rows=3D12560 width=3D824) (actual time=3D3.326.= .14.295 > rows=3D9884 loops=3D3) > > Recheck Cond: ((db_instance_id =3D > 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >=3D '2023-0= 3-15 > 10:00:00+00'::timestamp with time zone) AND (created_at <=3D '2025-04-03 > 10:00:00+00'::timestamp with time zone))"}, > > Filter: ((metrics -> 'perf_average_query_runtime'::text) > IS NOT NULL) > > Rows Removed by Filter: 68 > > Heap Blocks: exact=3D4272 > > -> Bitmap Index Scan on > idx_databases_metrics_instance_date_custom_created_debugging > (cost=3D0.00..983.24 rows=3D30294 width=3D0) (actual time=3D3.786.786 row= s=3D29856 > loops=3D1)"}, > > Index Cond: ((db_instance_id =3D > 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >=3D '2023-0= 3-15 > 10:00:00+00'::timestamp with time zone) AND (created_at <=3D '2025-04-03 > 10:00:00+00'::timestamp with time zone))"}, > > ... > > With a plan: > > ``` > > Limit (cost=3D0.43..229.66 rows=3D1 width=3D824) (actual > time=3D7538.004..7538.005 rows=3D1 loops=3D1) > > -> Index Scan using databases_metrics_pkey on databases_metrics > (cost=3D0.43..6909156.38 rows=3D30142 width=3D824) (actual > time=3D7538.002..7538.003 rows=3D1 loops=3D1) > > Filter: ((created_at >=3D '2023-03-15 10:00:00+00'::timestamp w= ith > time zone) AND (created_at <=3D '2025-04-03 10:00:00+00'::timestamp with = time > zone) A((metrics -> 'perf_average_query_runtime'::text) IS NOT NULL) AND > (db_instance_id =3D 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid))"}, > > Rows Removed by Filter: 10244795 > > Planning Time: 0.128 ms > > Execution Time: 7538.032 ms > > ``` > > On your second query Postgres uses the index "databases_metrics_pkey". > I assume that it is built using the "id" column. It could be very fast > with the statement "ORDER BY ... LIMIT", but due to the additional > filter Postgres firstly has to remove 10mln rows, which doesn't > satisfy the filter, only to reach one single row. > > On the first query Postgres has to read and sort only 29k rows using > the index "idx_databases_metrics_instance_date_custom_created_debugging", > which is better suited for the used filter if it includes the columns > used in the filter. > > I'm not sure why Postgres chooses the index "databases_metrics_pkey". > Maybe you have outdated statistics. Did you try to run VACUUM ANALYZE > on the table? > > -- > Kind regards, > Artur > > Maybe you have outdated statistics. Did you try to run VACUUM ANALYZE on the table Yes, I just tried this, nothing changed. I found this post: https://bohanzhang.me/assets/blogs/order_by_limit/order_by_limit.html And when applied the suggestion to add `+0` to the "order by" it worked faster: ``` SELECT "databases_metrics"."metrics" FROM "databases_metrics" WHERE ( "databases_metrics"."created_at" >=3D '2023-03-15 10:00:00+00:00'::timestamptz AND "databases_metrics"."db_instance_id" =3D 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::UUID AND "databases_metrics"."created_at" <=3D '2025-04-03 10:00:00+00:00'::timestamptz AND ( metrics -> 'perf_average_query_runtime' IS NOT NULL ) ) ORDER BY "databases_metrics"."id"+0 ASC LIMIT 1; ``` the plan: ``` Limit (cost=3D97195.71..97195.71 rows=3D1 width=3D821) (actual time=3D38.084..38.086 rows=3D1 loops=3D1) -> Sort (cost=3D97195.71..97261.91 rows=3D26482 width=3D821) (actual time=3D38.083..38.084 rows=3D1 loops=3D1) Sort Key: ((id + 0)) Sort Method: top-N heapsort Memory: 27kB -> Bitmap Heap Scan on databases_metrics (cost=3D867.87..97063.30 rows=3D26482 width=3D821) (actual time=3D4.686..30.036 rows=3D29653 loops= =3D1) Recheck Cond: ((db_instance_id =3D 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >=3D '2023-03-= 15 10:00:00+00'::timestamp with time zone) AND (created_at <=3D '2025-04-03 10:00:00+00'::timestamp with time zone)) Filter: ((metrics -> 'perf_average_query_runtime'::text) IS NOT NULL) Rows Removed by Filter: 203 Heap Blocks: exact=3D13492 -> Bitmap Index Scan on idx_databases_metrics_instance_date_custom_created_debugging (cost=3D0.00..861.25 rows=3D26615 width=3D0) (actual time=3D2.695..2.696 rows=3D29856 loops=3D1) Index Cond: ((db_instance_id =3D 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >=3D '2023-03-= 15 10:00:00+00'::timestamp with time zone) AND (created_at <=3D '2025-04-03 10:00:00+00'::timestamp with time zone)) Planning Time: 0.129 ms Execution Time: 38.121 ms ``` --00000000000095b1f2063231fa97 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Mon, Apr 7, 2025 at 4:06=E2=80=AFPM Ar= tur Zakirov <zaartur@gmail.com&= gt; wrote:
On Mon, 7 Apr 2025 at 14:45, Costa = Alexoglou <costa@d= btune.com> wrote:
> ...
> with a plan:
> ```
> Gather Merge=C2=A0 (cost=3D115584.47..118515.35 rows=3D25120 width=3D8= 24) (actual time=3D46.004..74.267 rows=3D29653 loops=3D1)
>=C2=A0 =C2=A0Workers Planned: 2
>=C2=A0 =C2=A0Workers Launched: 2
>=C2=A0 =C2=A0->=C2=A0 Sort=C2=A0 (cost=3D114584.45..114615.85 rows= =3D12560 width=3D824) (actual time=3D41.200..47.322 rows=3D9884 loops=3D3)<= br> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Sort Key: id
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Sort Method: external merge=C2=A0 Dis= k: 16360kB
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Worker 0:=C2=A0 Sort Method: external= merge=C2=A0 Disk: 15552kB
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Worker 1:=C2=A0 Sort Method: external= merge=C2=A0 Disk: 14536kB
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0->=C2=A0 Parallel Bitmap Heap Scan= on databases_metrics=C2=A0 (cost=3D990.77..109175.83 rows=3D12560 width=3D= 824) (actual time=3D3.326..14.295 rows=3D9884 loops=3D3)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Recheck Cond: ((= db_instance_id =3D 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND= (created_at >=3D '2023-03-15 10:00:00+00'::timestamp with time = zone) AND (created_at <=3D '2025-04-03 10:00:00+00'::timestamp w= ith time zone))"},
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Filter: ((metric= s -> 'perf_average_query_runtime'::text) IS NOT NULL)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Rows Removed by = Filter: 68
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Heap Blocks: exa= ct=3D4272
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0->=C2=A0 Bitm= ap Index Scan on idx_databases_metrics_instance_date_custom_created_debuggi= ng=C2=A0 (cost=3D0.00..983.24 rows=3D30294 width=3D0) (actual time=3D3.786.= 786 rows=3D29856 loops=3D1)"},
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0Index Cond: ((db_instance_id =3D 'c4c97a60-b88e-4cd3-a2f1-random-= uuid'::uuid) AND (created_at >=3D '2023-03-15 10:00:00+00'::= timestamp with time zone) AND (created_at <=3D '2025-04-03 10:00:00+= 00'::timestamp with time zone))"},
> ...
> With a plan:
> ```
> Limit=C2=A0 (cost=3D0.43..229.66 rows=3D1 width=3D824) (actual time=3D= 7538.004..7538.005 rows=3D1 loops=3D1)
>=C2=A0 =C2=A0->=C2=A0 Index Scan using databases_metrics_pkey on dat= abases_metrics=C2=A0 (cost=3D0.43..6909156.38 rows=3D30142 width=3D824) (ac= tual time=3D7538.002..7538.003 rows=3D1 loops=3D1)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Filter: ((created_at >=3D '202= 3-03-15 10:00:00+00'::timestamp with time zone) AND (created_at <=3D= '2025-04-03 10:00:00+00'::timestamp with time zone) A((metrics -&g= t; 'perf_average_query_runtime'::text) IS NOT NULL) AND (db_instanc= e_id =3D 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid))"},
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Rows Removed by Filter: 10244795
> Planning Time: 0.128 ms
> Execution Time: 7538.032 ms
> ```

On your second query Postgres uses the index "databases_metrics_pkey&q= uot;.
I assume that it is built using the "id" column. It could be very= fast
with the statement "ORDER BY ... LIMIT", but due to the additiona= l
filter Postgres firstly has to remove 10mln rows, which doesn't
satisfy the filter, only to reach one single row.

On the first query Postgres has to read and sort only 29k rows using
the index "idx_databases_metrics_instance_date_custom_created_debuggin= g",
which is better suited for the used filter if it includes the columns
used in the filter.

I'm not sure why Postgres chooses the index "databases_metrics_pke= y".
Maybe you have outdated statistics. Did you try to run VACUUM ANALYZE
on the table?

--
Kind regards,
Artur


> Maybe you have outdated statistics.= Did you try to run VACUUM ANALYZE
on the table

Yes, I just tried= this, nothing changed.

I found this post:=C2=A0https://boha= nzhang.me/assets/blogs/order_by_limit/order_by_limit.html

And wh= en applied the suggestion to add `+0` to the "order by" it worked= faster:
```
SELECT "databases_metrics"."metrics"=
FROM
=C2=A0 "databases_metrics"
WHERE
=C2=A0 (
= =C2=A0 =C2=A0 "databases_metrics"."created_at" >=3D = '2023-03-15 10:00:00+00:00'::timestamptz
=C2=A0 =C2=A0 AND "= ;databases_metrics"."db_instance_id" =3D 'c4c97a60-b88e-= 4cd3-a2f1-random-uuid'::UUID
=C2=A0 =C2=A0 AND "databases_metri= cs"."created_at" <=3D '2025-04-03 10:00:00+00:00'= ::timestamptz
=C2=A0 =C2=A0 AND (
=C2=A0 =C2=A0 =C2=A0 metrics -> = 'perf_average_query_runtime' IS NOT NULL
=C2=A0 =C2=A0 )
=C2= =A0 )
ORDER BY
=C2=A0 "databases_metrics"."id"+0 = ASC LIMIT 1;
```

the plan:
```
Limit =C2=A0(cost=3D97195.71= ..97195.71 rows=3D1 width=3D821) (actual time=3D38.084..38.086 rows=3D1 loo= ps=3D1)
=C2=A0 -> =C2=A0Sort =C2=A0(cost=3D97195.71..97261.91 rows=3D= 26482 width=3D821) (actual time=3D38.083..38.084 rows=3D1 loops=3D1)
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 Sort Key: ((id + 0))
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 Sort Method: top-N heapsort =C2=A0Memory: 27kB
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 -> =C2=A0Bitmap Heap Scan on databases_metrics =C2=A0(cost=3D867.= 87..97063.30 rows=3D26482 width=3D821) (actual time=3D4.686..30.036 rows=3D= 29653 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Rechec= k Cond: ((db_instance_id =3D 'c4c97a60-b88e-4cd3-a2f1-random-uuid':= :uuid) AND (created_at >=3D '2023-03-15 10:00:00+00'::timestamp = with time zone) AND (created_at <=3D '2025-04-03 10:00:00+00'::t= imestamp with time zone))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 Filter: ((metrics -> 'perf_average_query_runtime'::text) IS = NOT NULL)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Rows Removed = by Filter: 203
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Heap Blo= cks: exact=3D13492
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 ->= ; =C2=A0Bitmap Index Scan on idx_databases_metrics_instance_date_custom_cre= ated_debugging =C2=A0(cost=3D0.00..861.25 rows=3D26615 width=3D0) (actual t= ime=3D2.695..2.696 rows=3D29856 loops=3D1)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Index Cond: ((db_instance_id =3D = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= =3D '2023-03-15 10:00:00+00'::timestamp with time zone) AND (create= d_at <=3D '2025-04-03 10:00:00+00'::timestamp with time zone))Planning Time: 0.129 ms
Execution Time: 38.121 ms
```
--00000000000095b1f2063231fa97--