Re: Performance regression when adding LIMIT 1 to a query

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Costa Alexoglou <[email protected]>
To: Artur Zakirov <[email protected]>
Cc: [email protected]
Subject: Re: Performance regression when adding LIMIT 1 to a query
Date: Mon, 7 Apr 2025 17:34:47 +0200
Message-ID: <CAJ+5Ff6-gwJ+xYgM9fefx+T6T5pSfbo2LKbiui3s3ZXUmjatPQ@mail.gmail.com> (raw)
In-Reply-To: <CAKNkYnwL1uWBtjDgktU4BcvcZScgXyF_yme=+DfVjox6preqkA@mail.gmail.com>
References: <CAJ+5Ff7s1NxcyZ4x-FdrygRT0KMuyROVPrP+Pm_1Z2DwMdv07Q@mail.gmail.com>
	<CAKNkYnwL1uWBtjDgktU4BcvcZScgXyF_yme=+DfVjox6preqkA@mail.gmail.com>

On Mon, Apr 7, 2025 at 4:06 PM Artur Zakirov <[email protected]> wrote:

> On Mon, 7 Apr 2025 at 14:45, Costa Alexoglou <[email protected]> wrote:
> > ...
> > with a plan:
> > ```
> > Gather Merge  (cost=115584.47..118515.35 rows=25120 width=824) (actual
> time=46.004..74.267 rows=29653 loops=1)
> >   Workers Planned: 2
> >   Workers Launched: 2
> >   ->  Sort  (cost=114584.45..114615.85 rows=12560 width=824) (actual
> time=41.200..47.322 rows=9884 loops=3)
> >         Sort Key: id
> >         Sort Method: external merge  Disk: 16360kB
> >         Worker 0:  Sort Method: external merge  Disk: 15552kB
> >         Worker 1:  Sort Method: external merge  Disk: 14536kB
> >         ->  Parallel Bitmap Heap Scan on databases_metrics
> (cost=990.77..109175.83 rows=12560 width=824) (actual time=3.326..14.295
> rows=9884 loops=3)
> >               Recheck Cond: ((db_instance_id =
> 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
> 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
> 10:00:00+00'::timestamp with time zone))"},
> >               Filter: ((metrics -> 'perf_average_query_runtime'::text)
> IS NOT NULL)
> >               Rows Removed by Filter: 68
> >               Heap Blocks: exact=4272
> >               ->  Bitmap Index Scan on
> idx_databases_metrics_instance_date_custom_created_debugging
> (cost=0.00..983.24 rows=30294 width=0) (actual time=3.786.786 rows=29856
> loops=1)"},
> >                     Index Cond: ((db_instance_id =
> 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
> 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
> 10:00:00+00'::timestamp with time zone))"},
> > ...
> > With a plan:
> > ```
> > Limit  (cost=0.43..229.66 rows=1 width=824) (actual
> time=7538.004..7538.005 rows=1 loops=1)
> >   ->  Index Scan using databases_metrics_pkey on databases_metrics
> (cost=0.43..6909156.38 rows=30142 width=824) (actual
> time=7538.002..7538.003 rows=1 loops=1)
> >         Filter: ((created_at >= '2023-03-15 10:00:00+00'::timestamp with
> time zone) AND (created_at <= '2025-04-03 10:00:00+00'::timestamp with time
> zone) A((metrics -> 'perf_average_query_runtime'::text) IS NOT NULL) AND
> (db_instance_id = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid))"},
> >         Rows Removed by Filter: 10244795
> > Planning Time: 0.128 ms
> > Execution Time: 7538.032 ms
> > ```
>
> On your second query Postgres uses the index "databases_metrics_pkey".
> I assume that it is built using the "id" column. It could be very fast
> with the statement "ORDER BY ... LIMIT", but due to the additional
> filter Postgres firstly has to remove 10mln rows, which doesn't
> satisfy the filter, only to reach one single row.
>
> On the first query Postgres has to read and sort only 29k rows using
> the index "idx_databases_metrics_instance_date_custom_created_debugging",
> which is better suited for the used filter if it includes the columns
> used in the filter.
>
> I'm not sure why Postgres chooses the index "databases_metrics_pkey".
> Maybe you have outdated statistics. Did you try to run VACUUM ANALYZE
> on the table?
>
> --
> Kind regards,
> Artur
>


> Maybe you have outdated statistics. Did you try to run VACUUM ANALYZE
on the table

Yes, I just tried this, nothing changed.

I found this post:
https://bohanzhang.me/assets/blogs/order_by_limit/order_by_limit.html

And when applied the suggestion to add `+0` to the "order by" it worked
faster:
```
SELECT "databases_metrics"."metrics"
FROM
  "databases_metrics"
WHERE
  (
    "databases_metrics"."created_at" >= '2023-03-15
10:00:00+00:00'::timestamptz
    AND "databases_metrics"."db_instance_id" =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::UUID
    AND "databases_metrics"."created_at" <= '2025-04-03
10:00:00+00:00'::timestamptz
    AND (
      metrics -> 'perf_average_query_runtime' IS NOT NULL
    )
  )
ORDER BY
  "databases_metrics"."id"+0 ASC LIMIT 1;
```

the plan:
```
Limit  (cost=97195.71..97195.71 rows=1 width=821) (actual
time=38.084..38.086 rows=1 loops=1)
  ->  Sort  (cost=97195.71..97261.91 rows=26482 width=821) (actual
time=38.083..38.084 rows=1 loops=1)
        Sort Key: ((id + 0))
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Bitmap Heap Scan on databases_metrics  (cost=867.87..97063.30
rows=26482 width=821) (actual time=4.686..30.036 rows=29653 loops=1)
              Recheck Cond: ((db_instance_id =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
10:00:00+00'::timestamp with time zone))
              Filter: ((metrics -> 'perf_average_query_runtime'::text) IS
NOT NULL)
              Rows Removed by Filter: 203
              Heap Blocks: exact=13492
              ->  Bitmap Index Scan on
idx_databases_metrics_instance_date_custom_created_debugging
 (cost=0.00..861.25 rows=26615 width=0) (actual time=2.695..2.696
rows=29856 loops=1)
                    Index Cond: ((db_instance_id =
'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15
10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03
10:00:00+00'::timestamp with time zone))
Planning Time: 0.129 ms
Execution Time: 38.121 ms
```

view thread (2+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Performance regression when adding LIMIT 1 to a query
  In-Reply-To: <CAJ+5Ff6-gwJ+xYgM9fefx+T6T5pSfbo2LKbiui3s3ZXUmjatPQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox