Re: postgresql-14 slow query

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Kenny Bachman <[email protected]>
To: [email protected]
To: Paul Smith <[email protected]>
To: [email protected]
To: [email protected]
To: [email protected]
Cc: [email protected]
Subject: Re: postgresql-14 slow query
Date: Sat, 16 Apr 2022 12:26:02 +0300
Message-ID: <CAC0w7LKXT=41NDEar4CPrLAk1rwWDwR4XZV-gdrQViFGobTZBA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAC0w7L+cxDkvgeM9vA_pPTAQ0pMo4PLuh9s-rCyR9AoLnYTL3A@mail.gmail.com>
	<[email protected]>

Thank you for your all e-mails. I want to provide more information about
the query and table.
Table count(*) is = 24M
is_false column doesn't selectivity, so is_false=0 count(*) is ~24M.
My work _mem is 64MB

Also, my original query is below. I changed the query to ( distinct on ) in
order to avoid hashaggregate.

SELECT order_history.order_id,
max(order_history.creation_date) AS c_date
FROM work.order_history
WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233,
348])) AND order_history.is_false = 0
GROUP BY order_history.order_id;

EXPLAIN (ANALYZE ,BUFFERS) SELECT order_history.order_id,
max(order_history.creation_date) AS c_date
FROM work.order_history
WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233,
348])) AND order_history.is_false = 0
GROUP BY order_history.order_id;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize HashAggregate  (cost=572423.43..574487.66 rows=206423 width=12)
(actual time=2579.826..3394.787 rows=2049357 loops=1)
   Group Key: order_id
   Batches: 5  Memory Usage: 65585kB  Disk Usage: 56504kB
   Buffers: shared hit=350539, temp read=6850 written=12167
   ->  Gather  (cost=483661.54..568294.97 rows=825692 width=12) (actual
time=1650.341..1961.442 rows=2085884 loops=1)
         Workers Planned: 4
         Workers Launched: 4
         Buffers: shared hit=350539
         ->  Partial HashAggregate  (cost=482661.54..484725.77 rows=206423
width=12) (actual time=1641.882..1730.569 rows=417177 loops=5)
               Group Key: order_id
               Batches: 1  Memory Usage: 61457kB
               Buffers: shared hit=350539
               Worker 0:  Batches: 1  Memory Usage: 45073kB
               Worker 1:  Batches: 1  Memory Usage: 57361kB
               Worker 2:  Batches: 1  Memory Usage: 57361kB
               Worker 3:  Batches: 1  Memory Usage: 61457kB
               ->  Parallel Seq Scan on order_history
 (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.025..1279.366
rows=1378310 loops=5)
                     Filter: ((is_false = 0) AND (state_id = ANY
('{30,51,63,136,195,233,348}'::integer[])))
                     Rows Removed by Filter: 3268432
                     Buffers: shared hit=350539
 Planning:
   Buffers: shared hit=7
 Planning Time: 0.644 ms
 Execution Time: 3482.662 ms


And the execution plan after creating an index;

CREATE INDEX ON work.order_history (state_id ,order_id);
CREATE INDEX
ANALYZE work.order_history;
ANALYZE
EXPLAIN (ANALYZE ,BUFFERS) SELECT order_history.order_id,
max(order_history.creation_date) AS c_date
FROM work.order_history
WHERE (order_history.state_id = ANY (ARRAY[30, 51, 63, 136, 195, 233,
348])) AND order_history.is_false = 0
GROUP BY order_history.order_id;

                   QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=558809.02..560882.30 rows=207328 width=12) (actual
time=6050.759..7598.165 rows=2049357 loops=1)
   Group Key: order_id
   Batches: 5  Memory Usage: 65585kB  Disk Usage: 226272kB
   Buffers: shared hit=6007222 read=9864, temp read=28099 written=49004
   ->  Index Scan using order_history_state_id_order_id_idx on
order_history  (cost=0.44..524241.93 rows=6913417 width=12) (actual
time=0.079..4137.626 rows=6891551 loops=1)
         Index Cond: (state_id = ANY
('{30,51,63,136,195,233,348}'::integer[]))
         Filter: (is_false = 0)
         Rows Removed by Filter: 5301
         Buffers: shared hit=6007222 read=9864
 Planning:
   Buffers: shared hit=53 read=2
 Planning Time: 0.634 ms
 Execution Time: 7695.625 ms

Paul Smith <[email protected]>, 16 Nis 2022 Cmt, 11:47 tarihinde şunu yazdı:

> "for fun" try doing
>
> Set enable_seqscan=off
>
> And try the queries again. This will discourage it from doing sequential
> scans and use indexes if possible. If it still does a sequential scan, then
> there's some reason it can't use the indexes. If it uses indexes now, then
> the planner must think that the sequential scan would be quicker
>
> On 15 April 2022 22:00:07 Kenny Bachman <[email protected]> wrote:
>
>> Hello Team,
>>
>> How can I tune this query? It got even slower when I created the index
>> for (state_id, order_id desc). The following explain analyze output is
>> without an index. It takes 13 seconds if I create that index. Could you
>> help me?
>>
>> Thank you so much for your help.
>>
>> SELECT DISTINCT ON (order_history.order_id) order_id,
>> order_history.creation_date  AS c_date
>> FROM work.order_history WHERE (order_history.state_id = ANY (ARRAY[30,
>> 51, 63, 136, 195, 233, 348])) AND order_history.is_false = 0
>> ORDER BY order_history.order_id DESC;
>>
>> EXPLAIN ANALYZE output:
>>
>>  Unique  (cost=672007.46..1519683.55 rows=206423 width=12) (actual
>> time=1701.420..3439.095 rows=2049357 loops=1)
>>    ->  Gather Merge  (cost=672007.46..1502346.48 rows=6934827 width=12)
>> (actual time=1701.419..2989.243 rows=6891551 loops=1)
>>          Workers Planned: 4
>>          Workers Launched: 4
>>          ->  Sort  (cost=671007.40..675341.67 rows=1733707 width=12)
>> (actual time=1657.609..1799.723 rows=1378310 loops=5)
>>                Sort Key: order_id DESC
>>                Sort Method: external merge  Disk: 38960kB
>>                Worker 0:  Sort Method: external merge  Disk: 31488kB
>>                Worker 1:  Sort Method: external merge  Disk: 36120kB
>>                Worker 2:  Sort Method: external merge  Disk: 31368kB
>>                Worker 3:  Sort Method: external merge  Disk: 36152kB
>>                ->  Parallel Seq Scan on order_history
>>  (cost=0.00..473993.00 rows=1733707 width=12) (actual time=0.041..1211.485
>> rows=1378310 loops=5)
>>                      Filter: ((is_false = 0) AND (state_id = ANY
>> ('{30,51,63,136,195,233,348}'::integer[])))
>>                      Rows Removed by Filter: 3268432
>>  Planning Time: 0.405 ms
>>  Execution Time: 3510.433 ms
>>
>
>
> --
>
> Paul Smith Computer Services
> Tel: 01484 855800
> Vat No: GB 685 6987 53
>
> Sign up for news & updates <http://www.pscs.co.uk/go/subscribe;
>

view thread (8+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: postgresql-14 slow query
  In-Reply-To: <CAC0w7LKXT=41NDEar4CPrLAk1rwWDwR4XZV-gdrQViFGobTZBA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox