public inbox for [email protected]  
help / color / mirror / Atom feed
From: David Rowley <[email protected]>
To: Dimitrios Apostolou <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: [email protected]
Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Date: Tue, 14 May 2024 01:22:38 +1200
Message-ID: <CAApHDvoJHxrsgQm8cS=yWN2akxP=bLxuYNPCaXXWcmcG+_b1iw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<CAApHDvqWfGdUpTUp9s7AZYSeh44aLEXoBQS7UcfKM0zKhTCkiQ@mail.gmail.com>
	<[email protected]>

On Tue, 14 May 2024 at 00:28, Dimitrios Apostolou <[email protected]> wrote:
>
> On Sat, 11 May 2024, David Rowley wrote:
>
> > On Sat, 11 May 2024 at 13:33, Tom Lane <[email protected]> wrote:
> >> I do kind of wonder why it's producing both a hashagg and a Unique
> >> step --- seems like it should do one or the other.
> >
> > It still needs to make the duplicate groups from parallel workers unique.
>
> Range partitioning of the table guarantees that, since the ranges are not
> overlapping.

That assumes the Append won't ever use > 1 worker per subnode, but
that's not the case for your plan as the subnodes are "Parallel".
That means all the workers could be working on the same subnode which
could result in one group being split between 2 or more workers.

Parallel Append can also run in a way that the Append child nodes will
only get 1 worker each.  However, even if that were the case for your
plan, we have no code that would skip the final aggregate phase when
the DISTINCT / GROUP contains all of the partition key columns.

David






view thread (17+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
  In-Reply-To: <CAApHDvoJHxrsgQm8cS=yWN2akxP=bLxuYNPCaXXWcmcG+_b1iw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox