public inbox for [email protected]  
help / color / mirror / Atom feed
SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
17+ messages / 3 participants
[nested] [flat]

* SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
@ 2024-05-10 18:38 Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Dimitrios Apostolou @ 2024-05-10 18:38 UTC (permalink / raw)
  To: [email protected]

Hello list,

INTRO

I have a huge (multi-billion rows) table partitioned into 1000 partitions.
Around half of the partitions are full and the rest are empty, created in
advance ready to receive future incoming data. Postgres is 16.2.
Here are the relevant parts of the schema:

> \d test_runs_raw
                                  Partitioned table "public.test_runs_raw"
       Column       |            Type             | Collation | Nullable |
Default
-------------------+-----------------------------+-----------+----------+----------------------------------
  run_n             | bigint                      |           | not null | generated by default as identity
  test_case_n       | smallint                    |           | not null |
  workitem_n        | integer                     |           | not null |
  test_resulttype_n | smallint                    |           |          |
Partition key: RANGE (workitem_n)
Indexes:
     "test_runs_raw_partitioned_pkey" PRIMARY KEY, btree (workitem_n, run_n)

Each partition is made to keep entries with workitem_n in ranges
(0,20k), (20k,40k) and so on (k = kilo) up to 20000k.


PROBLEM

I noticed that the following query is very very slow (too long to wait for
it to finish):

SELECT DISTINCT workitem_n FROM test_runs_raw ORDER BY workitem_n DESC LIMIT 10;

What is remarkable, is that in 998 out of 1000 table scans it involves,
the planner does not use the index. Instead it chooses a sequential scan.
Here is the output from EXPLAIN:

  Limit  (cost=853891608.79..853891608.99 rows=10 width=4)
    ->  Unique  (cost=853891608.79..853891612.79 rows=200 width=4)
          ->  Sort  (cost=853891608.79..853891610.79 rows=800 width=4)               Sort Key: test_runs_raw.workitem_n DESC
                ->  Gather  (cost=853891488.22..853891570.22 rows=800 width=4)
                      Workers Planned: 4
                      ->  HashAggregate  (cost=853890488.22..853890490.22 rows=200 width=4)
                            Group Key: test_runs_raw.workitem_n
                            ->  Parallel Append  (cost=0.00..813118117.30 rows=16308948365 width=4)
                                  ->  Parallel Index Only Scan Backward using test_runs_raw__part_max9600k_pkey on test_runs_raw__part_max9600k test_runs_raw_480  (cost=0.57..1597355.10 rows=33623320 width=4)
                                  ->  Parallel Index Only Scan Backward using test_runs_raw__part_max10140k_pkey on test_runs_raw__part_max10140k test_runs_raw_507  (cost=0.57..1210795.63 rows=25793672 width=4)
                                  ->  Parallel Seq Scan on test_runs_raw__part_max9500k test_runs_raw_475  (cost=0.00..3037793.12 rows=64121612 width=4)
                                  ->  Parallel Seq Scan on test_runs_raw__part_max11180k test_runs_raw_559  (cost=0.00..2918875.90 rows=61612190 width=4)
[ ... 996 more sequential scans ... ]

If I remove DISTINCT then the plan changes dramatically and it runs
instantaneously:

  Limit  (cost=363.84..367.30 rows=10 width=4)
    ->  Append  (cost=363.84..22527480551.58 rows=65235793929 width=4)
          ->  Index Only Scan Backward using test_runs_raw__part_max20000k_pkey on test_runs_raw__part_max20000k test_runs_raw_1000  (cost=0.12..2.34 rows=1 width=4)
          ->  Index Only Scan Backward using test_runs_raw__part_max19980k_pkey on test_runs_raw__part_max19980k test_runs_raw_999  (cost=0.12..2.34 rows=1 width=4)
          ->  Index Only Scan Backward using test_runs_raw__part_max19960k_pkey on test_runs_raw__part_max19960k test_runs_raw_998  (cost=0.12..2.34 rows=1 width=4)
          ->  Index Only Scan Backward using test_runs_raw__part_max19940k_pkey on test_runs_raw__part_max19940k test_runs_raw_997  (cost=0.12..2.34 rows=1 width=4)
[ ... 996 more index scans ... ]

Notice how in the last plan there is no parallel scanning. Instead the
partitions are scanned sequentially, *in proper order*,
so that the plan execution stops after reading the first
10 rows in the first non-empty partition.

Why can't the same be done with DISTINCT?
Please note that the workitem_n value range is well spread into in range
(0,13M) and the table has been gradually filled within one year, so I'm
assuming the vacuum worker has worked long enough to build sane statistics
(not sure how to verify that).


REMARKS

1. I tried reproducing the problem on an artificial table with few
    partitions and few values, but I couldn't. Both queries execute fast,
    and the planner is always choosing a non-parallel index-only scan.

2. Among testing changes to various settings, I just noticed that setting
    max_parallel_workers_per_gather to 0 (from the original value of 4)
    fixes the issue! On the original huge table, disabling parallelism
    actually makes the query infinitely faster and it returns within 1s! Is
    this a bug in the planner?


Thank you,
Dimitris







^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
@ 2024-05-10 19:24 ` Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Dimitrios Apostolou @ 2024-05-10 19:24 UTC (permalink / raw)
  To: [email protected]

On Fri, 10 May 2024, Dimitrios Apostolou wrote:

> I noticed that the following query is very very slow (too long to wait for it
> to finish):
>
> SELECT DISTINCT workitem_n FROM test_runs_raw ORDER BY workitem_n DESC LIMIT 10;

Update: even the simplest SELECT DISTINCT query shows similar behaviour:

EXPLAIN
SELECT DISTINCT workitem_n FROM test_runs_raw LIMIT 10;

  Limit  (cost=724518979.52..724518979.92 rows=10 width=4)
    ->  Unique  (cost=724518979.52..724518987.52 rows=200 width=4)
          ->  Sort  (cost=724518979.52..724518983.52 rows=1600 width=4)               Sort Key: test_runs_raw.workitem_n
                ->  Gather  (cost=724518732.37..724518894.37 rows=1600 width=4)
                      Workers Planned: 4
                      ->  HashAggregate  (cost=724517732.37..724517734.37 rows=200 width=4)
                            Group Key: test_runs_raw.workitem_n
                            ->  Parallel Append  (cost=0.00..704131546.90 rows=8154474186 width=4)
                                  ->  Parallel Index Only Scan using test_runs_raw__part_max9600k_pkey on test_runs_raw__part_max9600k test_runs_raw_480  (cost=0.57..1429238.50 rows=16811660 width=4)
                                  ->  Parallel Index Only Scan using test_runs_raw__part_max10140k_pkey on test_runs_raw__part_max10140k test_runs_raw_507  (cost=0.57..1081827.27 rows=12896836 width=4)
                                  ->  Parallel Seq Scan on test_runs_raw__part_max9500k test_runs_raw_475  (cost=0.00..2717185.06 rows=32060806 width=4)
                                  ->  Parallel Seq Scan on test_runs_raw__part_max11180k test_runs_raw_559  (cost=0.00..2610814.95 rows=30806095 width=4)


It also takes ages to return, so I have to interrupt it.

I believe it should exit early, as soon as it finds 10 distinct values
(which should be rather easy even with parallel seqscans, given the
pattern followed when inserting the data).


Thanks,
Dimitris






^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
@ 2024-05-10 19:35   ` Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Dimitrios Apostolou @ 2024-05-10 19:35 UTC (permalink / raw)
  To: [email protected]



On Fri, 10 May 2024, Dimitrios Apostolou wrote:

> On Fri, 10 May 2024, Dimitrios Apostolou wrote:
>
> Update: even the simplest SELECT DISTINCT query shows similar behaviour:

Further digging into this simple query, if I force the non-parallel plan
by setting max_parallel_workers_per_gather TO 0, I see that the query
planner comes up with a cost much higher:

  Limit  (cost=363.84..1134528847.47 rows=10 width=4)
    ->  Unique  (cost=363.84..22690570036.41 rows=200 width=4)
          ->  Append  (cost=363.84..22527480551.58 rows=65235793929 width=4)
                ->  Index Only Scan using test_runs_raw__part_max20k_pkey on test_runs_raw__part_max20k test_runs_raw_1  (cost=0.12..2.34 rows=1 width=4)
                ->  Index Only Scan using test_runs_raw__part_max40k_pkey on test_runs_raw__part_max40k test_runs_raw_2  (cost=0.12..2.34 rows=1 width=4)
[...]
                ->  Index Only Scan using test_runs_raw__part_max1780k_pkey on test_runs_raw__part_max1780k test_runs_raw_89  (cost=0.57..53587294.65 rows=106088160 width=4)
                ->  Index Only Scan using test_runs_raw__part_max1800k_pkey on test_runs_raw__part_max1800k test_runs_raw_90  (cost=0.57..98943539.74 rows=96214080 width=4)
                ->  Index Only Scan using test_runs_raw__part_max1820k_pkey on test_runs_raw__part_max1820k test_runs_raw_91  (cost=0.57..97495653.34 rows=193248960 width=4)
                ->  Index Only Scan using test_runs_raw__part_max1840k_pkey on test_runs_raw__part_max1840k test_runs_raw_92  (cost=0.57..110205205.07 rows=218440928 width=4)
                ->  Index Only Scan using test_runs_raw__part_max1860k_pkey on test_runs_raw__part_max1860k test_runs_raw_93  (cost=0.57..50164056.28 rows=99431760 width=4)
[...]


The total cost on the 1st line (cost=363.84..1134528847.47) has a much
higher upper limit than the total cost when
max_parallel_workers_per_gather is 4 (cost=853891608.79..853891608.99).

This explains the planner's choice. But I wonder why the cost estimation
is so far away from reality.


Dimitris







^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
@ 2024-05-10 20:22     ` Tom Lane <[email protected]>
  2024-05-11 01:10       ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-14 12:55       ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  0 siblings, 2 replies; 17+ messages in thread

From: Tom Lane @ 2024-05-10 20:22 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]

Dimitrios Apostolou <[email protected]> writes:
> Further digging into this simple query, if I force the non-parallel plan
> by setting max_parallel_workers_per_gather TO 0, I see that the query
> planner comes up with a cost much higher:

>   Limit  (cost=363.84..1134528847.47 rows=10 width=4)
>     ->  Unique  (cost=363.84..22690570036.41 rows=200 width=4)
>           ->  Append  (cost=363.84..22527480551.58 rows=65235793929 width=4)
> ...

> The total cost on the 1st line (cost=363.84..1134528847.47) has a much
> higher upper limit than the total cost when
> max_parallel_workers_per_gather is 4 (cost=853891608.79..853891608.99).
> This explains the planner's choice. But I wonder why the cost estimation
> is so far away from reality.

I'd say the blame lies with that (probably-default) estimate of
just 200 distinct rows.  That means the planner expects to have
to read about 5% (10/200) of the tables to get the result, and
that's making fast-start plans look bad.

Possibly an explicit ANALYZE on the partitioned table would help.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
@ 2024-05-11 01:10       ` Dimitrios Apostolou <[email protected]>
  2024-05-11 01:33         ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:35         ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
  1 sibling, 2 replies; 17+ messages in thread

From: Dimitrios Apostolou @ 2024-05-11 01:10 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]

On Fri, 10 May 2024, Tom Lane wrote:

> Dimitrios Apostolou <[email protected]> writes:
>> Further digging into this simple query, if I force the non-parallel plan
>> by setting max_parallel_workers_per_gather TO 0, I see that the query
>> planner comes up with a cost much higher:
>
>>   Limit  (cost=363.84..1134528847.47 rows=10 width=4)
>>     ->  Unique  (cost=363.84..22690570036.41 rows=200 width=4)
>>           ->  Append  (cost=363.84..22527480551.58 rows=65235793929 width=4)
>> ...
>
>> The total cost on the 1st line (cost=363.84..1134528847.47) has a much
>> higher upper limit than the total cost when
>> max_parallel_workers_per_gather is 4 (cost=853891608.79..853891608.99).
>> This explains the planner's choice. But I wonder why the cost estimation
>> is so far away from reality.
>
> I'd say the blame lies with that (probably-default) estimate of
> just 200 distinct rows.  That means the planner expects to have
> to read about 5% (10/200) of the tables to get the result, and
> that's making fast-start plans look bad.

Indeed that's an awful estimate, the table has more than 1M of unique
values in that column. Looking into pg_stat_user_tables, I can't see the
partitions having been vacuum'd or analyzed at all. I think they should
have been auto-analyzed, since they get a ton of INSERTs
(no deletes/updates though) and I have the default autovacuum settings.
Could it be that autovacuum starts, but never
finishes? I can't find something in the logs.

In any case, even after the planner decides to execute the terrible plan
with the parallel seqscans, why doesn't it finish right when it finds 10
distinct values?

>
> Possibly an explicit ANALYZE on the partitioned table would help.

Thanks, I'll save the ANALYZE as the last step; I feel it's a good
opportunity to figure out more details about how postgres works. Plus I
expect ANALYZE to last a couple of days, so I should first find quiet time
for that. :-)

Dimitris






^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:10       ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
@ 2024-05-11 01:33         ` Tom Lane <[email protected]>
  2024-05-11 01:37           ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
  1 sibling, 1 reply; 17+ messages in thread

From: Tom Lane @ 2024-05-11 01:33 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: [email protected]

Dimitrios Apostolou <[email protected]> writes:
> On Fri, 10 May 2024, Tom Lane wrote:
>> I'd say the blame lies with that (probably-default) estimate of
>> just 200 distinct rows.  That means the planner expects to have
>> to read about 5% (10/200) of the tables to get the result, and
>> that's making fast-start plans look bad.

> In any case, even after the planner decides to execute the terrible plan
> with the parallel seqscans, why doesn't it finish right when it finds 10
> distinct values?

That plan can't emit anything at all till it finishes the Sort.

I do kind of wonder why it's producing both a hashagg and a Unique
step --- seems like it should do one or the other.

> Thanks, I'll save the ANALYZE as the last step; I feel it's a good
> opportunity to figure out more details about how postgres works. Plus I
> expect ANALYZE to last a couple of days, so I should first find quiet time
> for that. :-)

It really should not take too long --- it reads a sample, not the
whole table.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:10       ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-11 01:33         ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
@ 2024-05-11 01:37           ` David Rowley <[email protected]>
  2024-05-13 12:27             ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: David Rowley @ 2024-05-11 01:37 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: Dimitrios Apostolou <[email protected]>; [email protected]

On Sat, 11 May 2024 at 13:33, Tom Lane <[email protected]> wrote:
> I do kind of wonder why it's producing both a hashagg and a Unique
> step --- seems like it should do one or the other.

It still needs to make the duplicate groups from parallel workers unique.

David






^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:10       ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-11 01:33         ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:37           ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
@ 2024-05-13 12:27             ` Dimitrios Apostolou <[email protected]>
  2024-05-13 13:22               ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Dimitrios Apostolou @ 2024-05-13 12:27 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Tom Lane <[email protected]>; [email protected]

On Sat, 11 May 2024, David Rowley wrote:

> On Sat, 11 May 2024 at 13:33, Tom Lane <[email protected]> wrote:
>> I do kind of wonder why it's producing both a hashagg and a Unique
>> step --- seems like it should do one or the other.
>
> It still needs to make the duplicate groups from parallel workers unique.

Range partitioning of the table guarantees that, since the ranges are not
overlapping.


Dimitris







^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:10       ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-11 01:33         ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:37           ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
  2024-05-13 12:27             ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
@ 2024-05-13 13:22               ` David Rowley <[email protected]>
  2024-05-13 14:07                 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: David Rowley @ 2024-05-13 13:22 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: Tom Lane <[email protected]>; [email protected]

On Tue, 14 May 2024 at 00:28, Dimitrios Apostolou <[email protected]> wrote:
>
> On Sat, 11 May 2024, David Rowley wrote:
>
> > On Sat, 11 May 2024 at 13:33, Tom Lane <[email protected]> wrote:
> >> I do kind of wonder why it's producing both a hashagg and a Unique
> >> step --- seems like it should do one or the other.
> >
> > It still needs to make the duplicate groups from parallel workers unique.
>
> Range partitioning of the table guarantees that, since the ranges are not
> overlapping.

That assumes the Append won't ever use > 1 worker per subnode, but
that's not the case for your plan as the subnodes are "Parallel".
That means all the workers could be working on the same subnode which
could result in one group being split between 2 or more workers.

Parallel Append can also run in a way that the Append child nodes will
only get 1 worker each.  However, even if that were the case for your
plan, we have no code that would skip the final aggregate phase when
the DISTINCT / GROUP contains all of the partition key columns.

David






^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:10       ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-11 01:33         ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:37           ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
  2024-05-13 12:27             ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-13 13:22               ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
@ 2024-05-13 14:07                 ` Dimitrios Apostolou <[email protected]>
  2024-05-13 14:22                   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Dimitrios Apostolou @ 2024-05-13 14:07 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Tom Lane <[email protected]>; [email protected]



On Tue, 14 May 2024, David Rowley wrote:

> That assumes the Append won't ever use > 1 worker per subnode, but
> that's not the case for your plan as the subnodes are "Parallel".
> That means all the workers could be working on the same subnode which
> could result in one group being split between 2 or more workers.

Didn't think of that, makes sense!

> Parallel Append can also run in a way that the Append child nodes will
> only get 1 worker each.

How can I tell which case it is, from the EXPLAIN output (for example
the output at [1]) ?

[1] https://www.postgresql.org/message-id/69077f15-4125-2d63-733f-21ce6eac4f01%40gmx.net

Dimitris







^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:10       ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-11 01:33         ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:37           ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
  2024-05-13 12:27             ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-13 13:22               ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
  2024-05-13 14:07                 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
@ 2024-05-13 14:22                   ` David Rowley <[email protected]>
  0 siblings, 0 replies; 17+ messages in thread

From: David Rowley @ 2024-05-13 14:22 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: Tom Lane <[email protected]>; [email protected]

On Tue, 14 May 2024 at 02:07, Dimitrios Apostolou <[email protected]> wrote:
>
> On Tue, 14 May 2024, David Rowley wrote:
> > Parallel Append can also run in a way that the Append child nodes will
> > only get 1 worker each.
>
> How can I tell which case it is, from the EXPLAIN output (for example
> the output at [1]) ?

IIRC, the planner does prefer to use Parallel aware child Paths when
creating a Parallel Append.  Given equivalent costs, there's no
advantage to it choosing a non-parallel aware Path.  The planner does
not have any optimisations that that would enable.  However, it is
possible that the planner *could* generate these. All the Append
subpaths would just have to all be parallel safe but not parallel
aware. You could identify them in EXPLAIN by seeing a "Parallel
Append" without the "Parallel" in front of the node names in any of
the Parallel Append's subpaths.

David






^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:10       ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
@ 2024-05-11 01:35         ` David Rowley <[email protected]>
  2024-05-13 12:41           ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  1 sibling, 1 reply; 17+ messages in thread

From: David Rowley @ 2024-05-11 01:35 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: Tom Lane <[email protected]>; [email protected]

On Sat, 11 May 2024 at 13:11, Dimitrios Apostolou <[email protected]> wrote:
> Indeed that's an awful estimate, the table has more than 1M of unique
> values in that column. Looking into pg_stat_user_tables, I can't see the
> partitions having been vacuum'd or analyzed at all. I think they should
> have been auto-analyzed, since they get a ton of INSERTs
> (no deletes/updates though) and I have the default autovacuum settings.
> Could it be that autovacuum starts, but never
> finishes? I can't find something in the logs.

It's not the partitions getting analyzed you need to worry about for
an ndistinct estimate on the partitioned table. It's auto-analyze or
ANALYZE on the partitioned table itself that you should care about.

If you look at [1], it says "Tuples changed in partitions and
inheritance children do not trigger analyze on the parent table."

> In any case, even after the planner decides to execute the terrible plan
> with the parallel seqscans, why doesn't it finish right when it finds 10
> distinct values?

It will. It's just that Sorting requires fetching everything from its subnode.

David

[1] https://www.postgresql.org/docs/16/routine-vacuuming.html#VACUUM-FOR-STATISTICS






^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:10       ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-11 01:35         ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
@ 2024-05-13 12:41           ` Dimitrios Apostolou <[email protected]>
  2024-05-13 12:46             ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Dimitrios Apostolou @ 2024-05-13 12:41 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Tom Lane <[email protected]>; [email protected]

On Sat, 11 May 2024, David Rowley wrote:

> On Sat, 11 May 2024 at 13:11, Dimitrios Apostolou <[email protected]> wrote:
>> Indeed that's an awful estimate, the table has more than 1M of unique
>> values in that column. Looking into pg_stat_user_tables, I can't see the
>> partitions having been vacuum'd or analyzed at all. I think they should
>> have been auto-analyzed, since they get a ton of INSERTs
>> (no deletes/updates though) and I have the default autovacuum settings.
>> Could it be that autovacuum starts, but never
>> finishes? I can't find something in the logs.
>
> It's not the partitions getting analyzed you need to worry about for
> an ndistinct estimate on the partitioned table. It's auto-analyze or
> ANALYZE on the partitioned table itself that you should care about.
>
> If you look at [1], it says "Tuples changed in partitions and
> inheritance children do not trigger analyze on the parent table."

Thanks

>
>> In any case, even after the planner decides to execute the terrible plan
>> with the parallel seqscans, why doesn't it finish right when it finds 10
>> distinct values?
>
> It will. It's just that Sorting requires fetching everything from its subnode.

Isn't it plain wrong to have a sort step in the plan than? The different
partitions contain different value ranges with no overlap, and the last
query I posted doesn't even contain an ORDER BY clause, just a DISTINCT
clause on an indexed column.

Even with bad estimates, even with seq scan instead of index scan, the
plan should be such that it concludes all parallel work as soon as it
finds the 10 distinct values. And this is actually achieved if I disable
parallel plans. Could it be a bug in the parallel plan generation?


Dimitris







^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:10       ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-11 01:35         ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
  2024-05-13 12:41           ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
@ 2024-05-13 12:46             ` Dimitrios Apostolou <[email protected]>
  2024-05-13 13:34               ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
  0 siblings, 1 reply; 17+ messages in thread

From: Dimitrios Apostolou @ 2024-05-13 12:46 UTC (permalink / raw)
  To: David Rowley <[email protected]>; +Cc: Tom Lane <[email protected]>; [email protected]

On Mon, 13 May 2024, Dimitrios Apostolou wrote:

> On Sat, 11 May 2024, David Rowley wrote:
>
>>  On Sat, 11 May 2024 at 13:11, Dimitrios Apostolou <[email protected]> wrote:
>>>  Indeed that's an awful estimate, the table has more than 1M of unique
>>>  values in that column. Looking into pg_stat_user_tables, I can't see the
>>>  partitions having been vacuum'd or analyzed at all. I think they should
>>>  have been auto-analyzed, since they get a ton of INSERTs
>>>  (no deletes/updates though) and I have the default autovacuum settings.
>>>  Could it be that autovacuum starts, but never
>>>  finishes? I can't find something in the logs.
>>
>>  It's not the partitions getting analyzed you need to worry about for
>>  an ndistinct estimate on the partitioned table. It's auto-analyze or
>>  ANALYZE on the partitioned table itself that you should care about.
>>
>>  If you look at [1], it says "Tuples changed in partitions and
>>  inheritance children do not trigger analyze on the parent table."
>
> Thanks

Do I read that correctly, that I have to setup cron jobs to manually
analyze partitioned tables?


Dimitris







^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-11 01:10       ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-11 01:35         ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions David Rowley <[email protected]>
  2024-05-13 12:41           ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-13 12:46             ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
@ 2024-05-13 13:34               ` David Rowley <[email protected]>
  0 siblings, 0 replies; 17+ messages in thread

From: David Rowley @ 2024-05-13 13:34 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; +Cc: Tom Lane <[email protected]>; [email protected]

On Tue, 14 May 2024 at 00:46, Dimitrios Apostolou <[email protected]> wrote:
>
> On Mon, 13 May 2024, Dimitrios Apostolou wrote:
>
> > On Sat, 11 May 2024, David Rowley wrote:
> >>  If you look at [1], it says "Tuples changed in partitions and
> >>  inheritance children do not trigger analyze on the parent table."
> >

> Do I read that correctly, that I have to setup cron jobs to manually
> analyze partitioned tables?

It means that auto-analyze won't touch it.  Periodically doing an
ANALYZE on the partitioned table is probably a good idea.

David






^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
@ 2024-05-14 12:55       ` Dimitrios Apostolou <[email protected]>
  2024-05-14 16:14         ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  1 sibling, 1 reply; 17+ messages in thread

From: Dimitrios Apostolou @ 2024-05-14 12:55 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]; David Rowley <[email protected]>



On Fri, 10 May 2024, Tom Lane wrote:

> Dimitrios Apostolou <[email protected]> writes:
>> Further digging into this simple query, if I force the non-parallel plan
>> by setting max_parallel_workers_per_gather TO 0, I see that the query
>> planner comes up with a cost much higher:
>
>>   Limit  (cost=363.84..1134528847.47 rows=10 width=4)
>>     ->  Unique  (cost=363.84..22690570036.41 rows=200 width=4)
>>           ->  Append  (cost=363.84..22527480551.58 rows=65235793929 width=4)
>> ...
>
>> The total cost on the 1st line (cost=363.84..1134528847.47) has a much
>> higher upper limit than the total cost when
>> max_parallel_workers_per_gather is 4 (cost=853891608.79..853891608.99).
>> This explains the planner's choice. But I wonder why the cost estimation
>> is so far away from reality.
>
> I'd say the blame lies with that (probably-default) estimate of
> just 200 distinct rows.  That means the planner expects to have
> to read about 5% (10/200) of the tables to get the result, and
> that's making fast-start plans look bad.
>
> Possibly an explicit ANALYZE on the partitioned table would help.

It took long but if finished:

ANALYZE
Time: 19177398.025 ms (05:19:37.398)

And it made a difference indeed, the serial plan is chosen now:

EXPLAIN SELECT DISTINCT workitem_n FROM test_runs_raw ORDER BY workitem_n DESC LIMIT 10;
  Limit  (cost=364.29..1835512.29 rows=10 width=4)
    ->  Unique  (cost=364.29..22701882164.56 rows=123706 width=4)
          ->  Append  (cost=364.29..22538472401.60 rows=65363905182 width=4)
                ->  Index Only Scan Backward using test_runs_raw__part_max20000k_pkey on test_runs_raw__part_max20000k test_runs_raw_1000  (cost=0.12..2.34 rows=1 width=4)
                ->  Index Only Scan Backward using test_runs_raw__part_max19980k_pkey on test_runs_raw__part_max19980k test_runs_raw_999  (cost=0.12..2.34 rows=1 width=4)
                ->  Index Only Scan Backward using test_runs_raw__part_max19960k_pkey on test_runs_raw__part_max19960k test_runs_raw_998  (cost=0.12..2.34 rows=1 width=4)
[...]
                ->  Index Only Scan Backward using test_runs_raw__part_max12460k_pkey on test_runs_raw__part_max12460k test_runs_raw_623  (cost=0.57..12329614.53 rows=121368496 width=4)
                ->  Index Only Scan Backward using test_runs_raw__part_max12440k_pkey on test_runs_raw__part_max12440k test_runs_raw_622  (cost=0.57..5180832.16 rows=184927264 width=4)
                ->  Index Only Scan Backward using test_runs_raw__part_max12420k_pkey on test_runs_raw__part_max12420k test_runs_raw_621  (cost=0.57..4544964.21 rows=82018824 width=4)
[...]

Overall I think there are two issues that postgres could handle better
here:

1. Avoid the need for manual ANALYZE on partitioned table

2. Create a different parallel plan, one that can exit early, when the
    LIMIT is proportionally low. I feel the partitions could be
    parallel-scanned in-order, so that the whole thing stops when one
    partition has been read.

Thank you!
Dimitris







^ permalink  raw  reply  [nested|flat] 17+ messages in thread

* Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:24 ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 19:35   ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
  2024-05-10 20:22     ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Tom Lane <[email protected]>
  2024-05-14 12:55       ` Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
@ 2024-05-14 16:14         ` Dimitrios Apostolou <[email protected]>
  0 siblings, 0 replies; 17+ messages in thread

From: Dimitrios Apostolou @ 2024-05-14 16:14 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: [email protected]; David Rowley <[email protected]>

On Tue, 14 May 2024, Dimitrios Apostolou wrote:
>
> It took long but if finished:
>
> ANALYZE
> Time: 19177398.025 ms (05:19:37.398)

I see now that default_statistics_target is globally set to 500, so this
is probably the reason it took so long. I guess with the default of 100,
it would take approximately one hour. This is much better to have in a
cron job. :-)

Dimitris








^ permalink  raw  reply  [nested|flat] 17+ messages in thread


end of thread, other threads:[~2024-05-14 16:14 UTC | newest]

Thread overview: 17+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-05-10 18:38 SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions Dimitrios Apostolou <[email protected]>
2024-05-10 19:24 ` Dimitrios Apostolou <[email protected]>
2024-05-10 19:35   ` Dimitrios Apostolou <[email protected]>
2024-05-10 20:22     ` Tom Lane <[email protected]>
2024-05-11 01:10       ` Dimitrios Apostolou <[email protected]>
2024-05-11 01:33         ` Tom Lane <[email protected]>
2024-05-11 01:37           ` David Rowley <[email protected]>
2024-05-13 12:27             ` Dimitrios Apostolou <[email protected]>
2024-05-13 13:22               ` David Rowley <[email protected]>
2024-05-13 14:07                 ` Dimitrios Apostolou <[email protected]>
2024-05-13 14:22                   ` David Rowley <[email protected]>
2024-05-11 01:35         ` David Rowley <[email protected]>
2024-05-13 12:41           ` Dimitrios Apostolou <[email protected]>
2024-05-13 12:46             ` Dimitrios Apostolou <[email protected]>
2024-05-13 13:34               ` David Rowley <[email protected]>
2024-05-14 12:55       ` Dimitrios Apostolou <[email protected]>
2024-05-14 16:14         ` Dimitrios Apostolou <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox