public inbox for [email protected]
help / color / mirror / Atom feedFrom: David Rowley <[email protected]>
To: Dimitrios Apostolou <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: [email protected]
Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
Date: Tue, 14 May 2024 02:15:45 +1200
Message-ID: <CAApHDvpn9ukU8ZfUe_jPU6roFLS2tECx1fH6sbQYnAyfW_z6DA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<CAApHDvrtTKfh7HgAyXBd3KN0s-jxiHzW7sWdm-sFEjP6fGPCkg@mail.gmail.com>
<[email protected]>
<CAApHDvo8yYvqa1+bkW_f5xHX-gmKGYfaGwH+Y_KP-=9TOuF+-g@mail.gmail.com>
<[email protected]>
On Tue, 14 May 2024 at 01:52, Dimitrios Apostolou <[email protected]> wrote:
>
> On Tue, 14 May 2024, David Rowley wrote:
> > The query does contain an ORDER BY, so if the index is not chosen to
> > provide pre-sorted input, then something has to put the results in the
> > correct order before the LIMIT is applied.
>
> The last query I tried was:
>
> SELECT DISTINCT workitem_n FROM test_runs_raw LIMIT 10;
I was looking at the original query. In that case, we have 2 ways to
remove duplicate rows with DISTINCT, "Hash Aggregate" and "Sort" ->
"Unique". Both of these will consume all of their input rows before
outputting any rows.
DISTINCT with LIMIT is a special case that we don't have a good
operator for. In theory, we could have some "Hash Distinct" node type
that was less eager to consume all of its input rows. When invoked
"Hash Distinct" could consume input rows until it found one that
didn't exist in the hash table. I've no idea how that would work when
we exceed work_mem. However, most queries with a LIMIT will have an
ORDER BY, so such a node likely wouldn't get much use.
David
view thread (5+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions
In-Reply-To: <CAApHDvpn9ukU8ZfUe_jPU6roFLS2tECx1fH6sbQYnAyfW_z6DA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox