public inbox for [email protected]
help / color / mirror / Atom feedFrom: Tom Lane <[email protected]>
To: David Rowley <[email protected]>
Cc: ma lz <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Why not do distinct before SetOp
Date: Mon, 04 Nov 2024 10:18:04 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAApHDvqdDwEXxhZLTwsHkWnvpvVSYT2OXSzfxRrs2p5xudr9fw@mail.gmail.com>
References: <TYCPR01MB63514B9C70EF2A80F0D43394F2512@TYCPR01MB6351.jpnprd01.prod.outlook.com>
<CAApHDvqdDwEXxhZLTwsHkWnvpvVSYT2OXSzfxRrs2p5xudr9fw@mail.gmail.com>
David Rowley <[email protected]> writes:
> On Mon, 4 Nov 2024 at 22:52, ma lz <[email protected]> wrote:
>> select distinct a from t1 intersect select distinct a from t1; — this is faster than origin sql
> No, the planner does not attempt that optimisation. INTERSECT really
> isn't very well optimised.
It's not really obvious to me why adding DISTINCT would make it
faster. Seems like having two layers of plan nodes checking for
duplicate rows ought to be a loss. Maybe we need to do some
micro-optimization in or near LookupTupleHashEntry.
A different idea that occurred to me while looking at this is:
why have we got all this machinery to add and check a flag
column, rather than arranging things so that the two input
relations are "outer" and "inner" children of the SetOp?
It's possible some of the performance difference reported here
is due to having to pass more tuples through the SubqueryScan
node (with its projection to add the flag) and Append node,
but we could remove those steps entirely.
> If we did want to improve this area, I think the first thing we'd want
> to do is use standard join types rather than HashSetOp Intersect to
> implement INTERSECT (without ALL). To do that efficiently, we'd need
> to do a bit more work on the standard join types to have them
> efficiently support IS NOT DISTINCT FROM clauses as the join keys.
Maybe. It'd be a big project, but we do get complaints every so
often about IS NOT DISTINCT FROM predicates not being efficient,
so the benefits would be wider than just INTERSECT.
regards, tom lane
view thread (4+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Why not do distinct before SetOp
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox