public inbox for [email protected]
help / color / mirror / Atom feedFrom: Costa Alexoglou <[email protected]>
To: Artur Zakirov <[email protected]>
Cc: [email protected]
Subject: Re: Performance regression when adding LIMIT 1 to a query
Date: Mon, 7 Apr 2025 17:34:47 +0200
Message-ID: <CAJ+5Ff6-gwJ+xYgM9fefx+T6T5pSfbo2LKbiui3s3ZXUmjatPQ@mail.gmail.com> (raw)
In-Reply-To: <CAKNkYnwL1uWBtjDgktU4BcvcZScgXyF_yme=+DfVjox6preqkA@mail.gmail.com>
References: <CAJ+5Ff7s1NxcyZ4x-FdrygRT0KMuyROVPrP+Pm_1Z2DwMdv07Q@mail.gmail.com>
<CAKNkYnwL1uWBtjDgktU4BcvcZScgXyF_yme=+DfVjox6preqkA@mail.gmail.com>
On Mon, Apr 7, 2025 at 4:06 PM Artur Zakirov <[email protected]> wrote:
> On Mon, 7 Apr 2025 at 14:45, Costa Alexoglou <[email protected]> wrote:
> > ...
> > with a plan:
> > ```
> > Gather Merge (cost=115584.47..118515.35 rows=25120 width=824) (actual
> time=46.004..74.267 rows=29653 loops=1)
> > Workers Planned: 2
> > Workers Launched: 2
> > -> Sort (cost=114584.45..114615.85 rows=12560 width=824) (actual
> time=41.200..47.322 rows=9884 loops=3)
> > Sort Key: id
> > Sort Method: external merge Disk: 16360kB
> > Worker 0: Sort Method: external merge Disk: 15552kB
> > Worker 1: Sort Method: external merge Disk: 14536kB
> > -> Parallel Bitmap Heap Scan on databases_metrics
> (cost=990.77..109175.83 rows=12560 width=824) (actual time=3.326..14.295
> rows=9884 loops=3)
> > Recheck Cond: ((db_instance_id =
> 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
> 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
> 10:00:00+00'::timestamp with time zone))"},
> > Filter: ((metrics -> 'perf_average_query_runtime'::text)
> IS NOT NULL)
> > Rows Removed by Filter: 68
> > Heap Blocks: exact=4272
> > -> Bitmap Index Scan on
> idx_databases_metrics_instance_date_custom_created_debugging
> (cost=0.00..983.24 rows=30294 width=0) (actual time=3.786.786 rows=29856
> loops=1)"},
> > Index Cond: ((db_instance_id =
> 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
> 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
> 10:00:00+00'::timestamp with time zone))"},
> > ...
> > With a plan:
> > ```
> > Limit (cost=0.43..229.66 rows=1 width=824) (actual
> time=7538.004..7538.005 rows=1 loops=1)
> > -> Index Scan using databases_metrics_pkey on databases_metrics
> (cost=0.43..6909156.38 rows=30142 width=824) (actual
> time=7538.002..7538.003 rows=1 loops=1)
> > Filter: ((created_at >= '2023-03-15 10:00:00+00'::timestamp with
> time zone) AND (created_at <= '2025-04-03 10:00:00+00'::timestamp with time
> zone) A((metrics -> 'perf_average_query_runtime'::text) IS NOT NULL) AND
> (db_instance_id = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid))"},
> > Rows Removed by Filter: 10244795
> > Planning Time: 0.128 ms
> > Execution Time: 7538.032 ms
> > ```
>
> On your second query Postgres uses the index "databases_metrics_pkey".
> I assume that it is built using the "id" column. It could be very fast
> with the statement "ORDER BY ... LIMIT", but due to the additional
> filter Postgres firstly has to remove 10mln rows, which doesn't
> satisfy the filter, only to reach one single row.
>
> On the first query Postgres has to read and sort only 29k rows using
> the index "idx_databases_metrics_instance_date_custom_created_debugging",
> which is better suited for the used filter if it includes the columns
> used in the filter.
>
> I'm not sure why Postgres chooses the index "databases_metrics_pkey".
> Maybe you have outdated statistics. Did you try to run VACUUM ANALYZE
> on the table?
>
> --
> Kind regards,
> Artur
>
> Maybe you have outdated statistics. Did you try to run VACUUM ANALYZE
on the table
Yes, I just tried this, nothing changed.
I found this post:
https://bohanzhang.me/assets/blogs/order_by_limit/order_by_limit.html
And when applied the suggestion to add `+0` to the "order by" it worked
faster:
```
SELECT "databases_metrics"."metrics"
FROM
"databases_metrics"
WHERE
(
"databases_metrics"."created_at" >= '2023-03-15
10:00:00+00:00'::timestamptz
AND "databases_metrics"."db_instance_id" =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::UUID
AND "databases_metrics"."created_at" <= '2025-04-03
10:00:00+00:00'::timestamptz
AND (
metrics -> 'perf_average_query_runtime' IS NOT NULL
)
)
ORDER BY
"databases_metrics"."id"+0 ASC LIMIT 1;
```
the plan:
```
Limit (cost=97195.71..97195.71 rows=1 width=821) (actual
time=38.084..38.086 rows=1 loops=1)
-> Sort (cost=97195.71..97261.91 rows=26482 width=821) (actual
time=38.083..38.084 rows=1 loops=1)
Sort Key: ((id + 0))
Sort Method: top-N heapsort Memory: 27kB
-> Bitmap Heap Scan on databases_metrics (cost=867.87..97063.30
rows=26482 width=821) (actual time=4.686..30.036 rows=29653 loops=1)
Recheck Cond: ((db_instance_id =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
10:00:00+00'::timestamp with time zone))
Filter: ((metrics -> 'perf_average_query_runtime'::text) IS
NOT NULL)
Rows Removed by Filter: 203
Heap Blocks: exact=13492
-> Bitmap Index Scan on
idx_databases_metrics_instance_date_custom_created_debugging
(cost=0.00..861.25 rows=26615 width=0) (actual time=2.695..2.696
rows=29856 loops=1)
Index Cond: ((db_instance_id =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
10:00:00+00'::timestamp with time zone))
Planning Time: 0.129 ms
Execution Time: 38.121 ms
```
view thread (2+ messages)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Performance regression when adding LIMIT 1 to a query
In-Reply-To: <CAJ+5Ff6-gwJ+xYgM9fefx+T6T5pSfbo2LKbiui3s3ZXUmjatPQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox