Re: Performance regression when adding LIMIT 1 to a query

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Artur Zakirov <[email protected]>
To: Costa Alexoglou <[email protected]>
Cc: [email protected]
Subject: Re: Performance regression when adding LIMIT 1 to a query
Date: Mon, 7 Apr 2025 16:05:59 +0200
Message-ID: <CAKNkYnwL1uWBtjDgktU4BcvcZScgXyF_yme=+DfVjox6preqkA@mail.gmail.com> (raw)
In-Reply-To: <CAJ+5Ff7s1NxcyZ4x-FdrygRT0KMuyROVPrP+Pm_1Z2DwMdv07Q@mail.gmail.com>
References: <CAJ+5Ff7s1NxcyZ4x-FdrygRT0KMuyROVPrP+Pm_1Z2DwMdv07Q@mail.gmail.com>

On Mon, 7 Apr 2025 at 14:45, Costa Alexoglou <[email protected]> wrote:
> ...
> with a plan:
> ```
> Gather Merge  (cost=115584.47..118515.35 rows=25120 width=824) (actual time=46.004..74.267 rows=29653 loops=1)
>   Workers Planned: 2
>   Workers Launched: 2
>   ->  Sort  (cost=114584.45..114615.85 rows=12560 width=824) (actual time=41.200..47.322 rows=9884 loops=3)
>         Sort Key: id
>         Sort Method: external merge  Disk: 16360kB
>         Worker 0:  Sort Method: external merge  Disk: 15552kB
>         Worker 1:  Sort Method: external merge  Disk: 14536kB
>         ->  Parallel Bitmap Heap Scan on databases_metrics  (cost=990.77..109175.83 rows=12560 width=824) (actual time=3.326..14.295 rows=9884 loops=3)
>               Recheck Cond: ((db_instance_id = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03 10:00:00+00'::timestamp with time zone))"},
>               Filter: ((metrics -> 'perf_average_query_runtime'::text) IS NOT NULL)
>               Rows Removed by Filter: 68
>               Heap Blocks: exact=4272
>               ->  Bitmap Index Scan on idx_databases_metrics_instance_date_custom_created_debugging  (cost=0.00..983.24 rows=30294 width=0) (actual time=3.786.786 rows=29856 loops=1)"},
>                     Index Cond: ((db_instance_id = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03 10:00:00+00'::timestamp with time zone))"},
> ...
> With a plan:
> ```
> Limit  (cost=0.43..229.66 rows=1 width=824) (actual time=7538.004..7538.005 rows=1 loops=1)
>   ->  Index Scan using databases_metrics_pkey on databases_metrics  (cost=0.43..6909156.38 rows=30142 width=824) (actual time=7538.002..7538.003 rows=1 loops=1)
>         Filter: ((created_at >= '2023-03-15 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03 10:00:00+00'::timestamp with time zone) A((metrics -> 'perf_average_query_runtime'::text) IS NOT NULL) AND (db_instance_id = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid))"},
>         Rows Removed by Filter: 10244795
> Planning Time: 0.128 ms
> Execution Time: 7538.032 ms
> ```

On your second query Postgres uses the index "databases_metrics_pkey".
I assume that it is built using the "id" column. It could be very fast
with the statement "ORDER BY ... LIMIT", but due to the additional
filter Postgres firstly has to remove 10mln rows, which doesn't
satisfy the filter, only to reach one single row.

On the first query Postgres has to read and sort only 29k rows using
the index "idx_databases_metrics_instance_date_custom_created_debugging",
which is better suited for the used filter if it includes the columns
used in the filter.

I'm not sure why Postgres chooses the index "databases_metrics_pkey".
Maybe you have outdated statistics. Did you try to run VACUUM ANALYZE
on the table?

-- 
Kind regards,
Artur

view thread (2+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Performance regression when adding LIMIT 1 to a query
  In-Reply-To: <CAKNkYnwL1uWBtjDgktU4BcvcZScgXyF_yme=+DfVjox6preqkA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox