public inbox for [email protected]
help / color / mirror / Atom feedFrom: Dimitrios Apostolou <[email protected]>
To: David Rowley <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: [email protected]
Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Date: Tue, 14 May 2024 20:11:26 +0200 (CEST)
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAApHDvo8yYvqa1+bkW_f5xHX-gmKGYfaGwH+Y_KP-=9TOuF+-g@mail.gmail.com>
References: <[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<CAApHDvrtTKfh7HgAyXBd3KN0s-jxiHzW7sWdm-sFEjP6fGPCkg@mail.gmail.com>
<[email protected]>
<CAApHDvo8yYvqa1+bkW_f5xHX-gmKGYfaGwH+Y_KP-=9TOuF+-g@mail.gmail.com>
On Tue, 14 May 2024, David Rowley wrote:
>
> If you were to put the n_distinct_inherited estimate back to 200 and
> disable sort, you should see the costs are higher for the index plan.
> If that's not the case then there might be a bug. It seems more
> likely that due to the n_distinct estimate being so low that the
> planner thought that a large enough fraction of the rows needed to be
> read and that made the non-index plan appear cheaper.
>
> I'd be interested in seeing what the costs are for the index plan. I
> think the following will give you that (untested):
>
> alter table test_runs_raw alter column workitem_n set
> (n_distinct_inherited=200);
> analyze test_runs_raw;
I had to stop this step because it was taking too long going through all
partitions again. But it seems it had the desired effect.
> set enable_sort=0;
> explain SELECT DISTINCT workitem_n FROM test_runs_raw ORDER BY workitem_n DESC LIMIT 10;
It chooses the non-parallel index plan:
Limit (cost=365.17..1135517462.36 rows=10 width=4)
-> Unique (cost=365.17..22710342308.83 rows=200 width=4)
-> Append (cost=365.17..22546660655.46 rows=65472661350 width=4)
-> Index Only Scan Backward using test_runs_raw__part_max20000k_pkey on test_runs_raw__part_max20000k test_runs_raw_1000 (cost=0.12..2.34 rows=1 width=4)
-> Index Only Scan Backward using test_runs_raw__part_max19980k_pkey on test_runs_raw__part_max19980k test_runs_raw_999 (cost=0.12..2.34 rows=1 width=4)
[... only index scans follow]
LIMIT 100 goes for the parallel seqscan plan, that even contains a sort!
But it seams to me that the extra upper level HashAggregate step raises
the cost by an order of magnitude, from 800M to 10G, in comparison to
doing (Unique->Sort) - see plan in the next paragraph.
Limit (cost=10857220388.76..10857220389.01 rows=100 width=4)
-> Sort (cost=10857220388.76..10857220389.26 rows=200 width=4)
Sort Key: test_runs_raw.workitem_n DESC
-> HashAggregate (cost=857220379.12..857220381.12 rows=200 width=4)
Group Key: test_runs_raw.workitem_n
-> Gather (cost=857220295.12..857220377.12 rows=800 width=4)
Workers Planned: 4
-> HashAggregate (cost=857219295.12..857219297.12 rows=200 width=4)
Group Key: test_runs_raw.workitem_n
-> Parallel Append (cost=0.00..816295259.21 rows=16369614363 width=4)
-> Parallel Index Only Scan Backward using test_runs_raw__part_max9600k_pkey on test_runs_raw__part_max9600k test_runs_raw_480 (cost=0.57..1597356.30 rows=33623360 width=4)
-> Parallel Index Only Scan Backward using test_runs_raw__part_max10140k_pkey on test_runs_raw__part_max10140k test_runs_raw_507 (cost=0.57..1210806.37 rows=25794030 width=4)
-> Parallel Seq Scan on test_runs_raw__part_max9500k test_runs_raw_475 (cost=0.00..3037800.88 rows=64122388 width=4)
-> Parallel Seq Scan on test_runs_raw__part_max11180k test_runs_raw_559 (cost=0.00..2918865.36 rows=61611136 width=4)
[... only seqscans follow]
If I re-enable sort, then it goes for the parallel seqscan plan even with LIMIT 10:
SET SESSION enable_sort TO TRUE;
EXPLAIN SELECT DISTINCT workitem_n FROM test_runs_raw ORDER BY workitem_n DESC LIMIT 10;
Limit (cost=857166256.39..857166256.59 rows=10 width=4)
-> Unique (cost=857166256.39..857166260.39 rows=200 width=4)
-> Sort (cost=857166256.39..857166258.39 rows=800 width=4)
Sort Key: test_runs_raw.workitem_n DESC
-> Gather (cost=857166135.82..857166217.82 rows=800 width=4)
Workers Planned: 4
-> HashAggregate (cost=857165135.82..857165137.82 rows=200 width=4)
Group Key: test_runs_raw.workitem_n
-> Parallel Append (cost=0.00..816243567.24 rows=16368627432 width=4)
-> Parallel Index Only Scan Backward using test_runs_raw__part_max9600k_pkey on test_runs_raw__part_max9600k test_runs_raw_480 (cost=0.57..1597356.30 rows=33623360 width=4)
-> Parallel Index Only Scan Backward using test_runs_raw__part_max10140k_pkey on test_runs_raw__part_max10140k test_runs_raw_507 (cost=0.57..1210806.37 rows=25794030 width=4)
-> Parallel Seq Scan on test_runs_raw__part_max9500k test_runs_raw_475 (cost=0.00..3037800.88 rows=64122388 width=4)
-> Parallel Seq Scan on test_runs_raw__part_max11180k test_runs_raw_559 (cost=0.00..2918865.36 rows=61611136 width=4)
[... only seqscans follow]
So in order of higher to lower cost, we have the following alternatives:
1. non-parallel index scan (800M)
2. parallel seqscan with sort (1.3G)
3. parallel seqscan without sort but actually has a sort (10G assuming it's the same as for LIMIT 100)
>
> -- undo
> alter table test_runs_raw alter column workitem_n set (n_distinct_inherited=-1);
I believe I need to set it to 0 to be back to defaults.
> reset enable_sort;
>
Regards,
Dimitris
view thread (5+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox