public inbox for [email protected]
help / color / mirror / Atom feedFrom: Richard Guo <[email protected]>
To: Tom Lane <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Tender Wang <[email protected]>
Cc: Paul George <[email protected]>
Cc: Andy Fan <[email protected]>
Cc: [email protected]
Subject: Re: Eager aggregation, take 3
Date: Tue, 21 Jan 2025 23:13:14 +0900
Message-ID: <CAMbWs48t2DxTKfz2-seyYxqayxnAo8b+5LV3nuPDk9gSqgLy2A@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com>
<[email protected]>
<CAMbWs49=eAd2W9jCtGhaZPPp+SOC_2rg16RTG74xAht=hkr5JQ@mail.gmail.com>
<CAMbWs49Nc4M3H+eCf1+8w8piDyEECjRb-gK_JMF4VvcyWwGEVQ@mail.gmail.com>
<CAMbWs49E_dR0nobsExsyetpnBpHObLTsQLsEbWKQLkh0omPxNg@mail.gmail.com>
<CAMbWs49B_qUiHvu2EqLHZRpLr3p_+QPBs50n2=L5ibYzniwTzA@mail.gmail.com>
<CAMbWs48KCQtDymnYi4M=Vz+WMzo3fkBxffJsyk6VX6hOXXv+VA@mail.gmail.com>
<CAMbWs49sv_MuOYqqrtmBN_oYf8VSQ2BXDwXaTpJTn_YfwyYdWQ@mail.gmail.com>
<CAMbWs49U8Sddx_fGszPdvA3jp_nheynxaqm5Y4NqMV21VBYAuQ@mail.gmail.com>
<CAMbWs4-LwyOg9ga+NVF7yQbMi0ZsZdN1G_sO2v=YJHV18=19+A@mail.gmail.com>
<CALA8mJquG_zCJXfVwash5LKqHGtZXQmq7RfTSaRDUzGYeW=7Rw@mail.gmail.com>
<CAMbWs4_EjgcBib5+y1LYcGB3EK3Y6R+OOxGKfJo42fDovadk1g@mail.gmail.com>
<CALA8mJqe0anNM8_V6cOeOQnCHUTQggn7iOQNyQr1VaN_xMjz+w@mail.gmail.com>
<CAMbWs48eE-s-jCicC8pSVfXk8Ws-ZvUKnsw8qH-DkVBdYv0eJQ@mail.gmail.com>
<CAHewXNmYM6DvR_kaxDL0w0fz9BwKbac+TSU3QS10aA3cXHyMmA@mail.gmail.com>
<CA+TgmoaxH=P63hLYgyJJcEbMRnw3xi16d=HxFi1j-m7MhH6W_w@mail.gmail.com>
<CAMbWs4_cOnpGsywj9Jt1WAgzJLW9Rxt5X13cfGz4iN2qvZQ68g@mail.gmail.com>
<CA+Tgmob0q7bRbsFTVDMjxHE6zA4uDQLQa-s0CtwUw49V53UL_A@mail.gmail.com>
<CAMbWs4-Xru_eKBeRHFduigSGihdixFWVTR8A+dtMw7Mao+RkJA@mail.gmail.com>
<CAMbWs49dLjSSQRWeud+KSN0G531ciZdYoLBd5qktXA+3JQm_UQ@mail.gmail.com>
<CAMbWs48LXGC-Y63YtzEeM-3f0NUXWCUEMs7XwGzywXTjUNMcxQ@mail.gmail.com>
<CAMbWs48XdzvnwfTHWxQ7qK-yjvdrbwsPpqhJBuKDnO+hcbsVwA@mail.gmail.com>
<CA+TgmoaO-7RHdyJuizWChXZm7EJGvDcfoePDDEyUA-y8vTB1tg@mail.gmail.com>
<CAMbWs4-+jXRpKuFMZa08bS34-TBka3qqjVMAUjF=-1RA9BKvgg@mail.gmail.com>
<CA+TgmoZapU1y59-s3o8oPt7Hv+cxRh_34FMu6MXumomLe+U1Cw@mail.gmail.com>
<CAMbWs4_sEeeBmucBzbamBMfA9uLxVmOc_MV=ZpSyDbTcrUO_XQ@mail.gmail.com>
<CA+Tgmob4fnv57PQB0Oox86mHSJQ0vVL249eT=gqPvrMkG7h1zw@mail.gmail.com>
<CAMbWs489NYyTcCTbrUi7hPXKtNY5vHrrFcHyMRAv=CA5WsszVw@mail.gmail.com>
<CA+TgmoazmDdcc7NeTo3WM5HW3DASNP4rfZw6X+2nnQKHampOng@mail.gmail.com>
<[email protected]>
On Tue, Jan 21, 2025 at 2:57 AM Tom Lane <[email protected]> wrote:
> However, a partial-aggregation path does not generate the same data
> as an unaggregated path, no matter how fuzzy you are willing to be
> about the concept. So I'm having a very hard time accepting that
> it ought to be part of the same RelOptInfo, and thus I don't really
> buy that annotating paths with a GroupPathInfo is the way forward.
Agreed. I think one point I failed to make myself clear on is that
I've never intended to put a partial-aggregation path and an
unaggregated path into the same RelOptInfo. One of the basic designs
of this patch is that partial-aggregation paths are placed in a
separate category of RelOptInfos, which I call "grouped relations"
(though I admit that's not the best name). This ensures that we never
compare a partial-aggregation path with an unaggregated path during
scan/join planning, because I am certain that the two categories of
paths are not comparable.
Regarding the GroupPathInfo proposal, my intention is to add a valid
GroupPathInfo only for the partial-aggregation paths. The goal is to
ensure that partial-aggregation paths within this category are
compared only if their partial aggregations are at the same location.
To be honest, I still doubt that this is necessary. I have two main
reasons for this.
1.
For a partial-aggregation path, the location where we place the
partial aggregation does not impose any restrictions on further
planning. This is different from the parameterized path case. If two
parameterized paths are equal on very other figure of merit, we will
choose the one with fewer required outer rels, as it means fewer join
restrictions on upper planning. However, for partial-aggregation
paths, we do not have a preference regarding the location of the
partial aggregation. For instance, for path "A JOIN PartialAgg(B)
JOIN C" and path "PartialAgg(A JOIN B) JOIN C", if one path dominates
the other on every figure of merit, it seems to me that there's no
point in keeping the less favorable one, although they have their
partial aggregations at different join levels.
2.
A partial-aggregation path of a rel essentially yields an aggregated
form of that rel's row set. The difference between the row sets
yielded by paths with different locations of partial aggregation is
primarily about the different degrees to which the rows are
aggregated. These sets are fundamentally homogeneous.
In summary, in my own opinion, I think the partial-aggregation paths
of the same "grouped relation" are comparable, regardless of the
position of the partial aggregation within the path tree. So I think
we should put them into the same RelOptInfo.
Of course, I could be very wrong about this. I would greatly
appreciate hearing others' thoughts on this.
Thanks
Richard
view thread (70+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Eager aggregation, take 3
In-Reply-To: <CAMbWs48t2DxTKfz2-seyYxqayxnAo8b+5LV3nuPDk9gSqgLy2A@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox