Re: Eager aggregation, take 3

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Robert Haas <[email protected]>
To: Richard Guo <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: Tender Wang <[email protected]>
Cc: Paul George <[email protected]>
Cc: Andy Fan <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Cc: [email protected]
Subject: Re: Eager aggregation, take 3
Date: Fri, 5 Sep 2025 09:09:56 -0400
Message-ID: <CA+Tgmob3xCPxNP7n91Tu3hg08i8GbyHY=Dd4CrsXxtcdT9E0Ww@mail.gmail.com> (raw)
In-Reply-To: <CAMbWs49qiox13EKb7bqgLu7Gu9oar+xe6KMwBjgFwod3JzPfUw@mail.gmail.com>
References: <CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com>
	<[email protected]>
	<CAMbWs49=eAd2W9jCtGhaZPPp+SOC_2rg16RTG74xAht=hkr5JQ@mail.gmail.com>
	<CAMbWs49Nc4M3H+eCf1+8w8piDyEECjRb-gK_JMF4VvcyWwGEVQ@mail.gmail.com>
	<CAMbWs49E_dR0nobsExsyetpnBpHObLTsQLsEbWKQLkh0omPxNg@mail.gmail.com>
	<CAMbWs49B_qUiHvu2EqLHZRpLr3p_+QPBs50n2=L5ibYzniwTzA@mail.gmail.com>
	<CAMbWs48KCQtDymnYi4M=Vz+WMzo3fkBxffJsyk6VX6hOXXv+VA@mail.gmail.com>
	<CAMbWs49sv_MuOYqqrtmBN_oYf8VSQ2BXDwXaTpJTn_YfwyYdWQ@mail.gmail.com>
	<CAMbWs49U8Sddx_fGszPdvA3jp_nheynxaqm5Y4NqMV21VBYAuQ@mail.gmail.com>
	<CAMbWs4-LwyOg9ga+NVF7yQbMi0ZsZdN1G_sO2v=YJHV18=19+A@mail.gmail.com>
	<CALA8mJquG_zCJXfVwash5LKqHGtZXQmq7RfTSaRDUzGYeW=7Rw@mail.gmail.com>
	<CAMbWs4_EjgcBib5+y1LYcGB3EK3Y6R+OOxGKfJo42fDovadk1g@mail.gmail.com>
	<CALA8mJqe0anNM8_V6cOeOQnCHUTQggn7iOQNyQr1VaN_xMjz+w@mail.gmail.com>
	<CAMbWs48eE-s-jCicC8pSVfXk8Ws-ZvUKnsw8qH-DkVBdYv0eJQ@mail.gmail.com>
	<CAMbWs483a7-8M0pDttG44r-+8Gevn9VG0xNceE3WpkEQxJXPZw@mail.gmail.com>
	<CAHewXNmYM6DvR_kaxDL0w0fz9BwKbac+TSU3QS10aA3cXHyMmA@mail.gmail.com>
	<CA+TgmoaxH=P63hLYgyJJcEbMRnw3xi16d=HxFi1j-m7MhH6W_w@mail.gmail.com>
	<CAMbWs4_cOnpGsywj9Jt1WAgzJLW9Rxt5X13cfGz4iN2qvZQ68g@mail.gmail.com>
	<CA+Tgmob0q7bRbsFTVDMjxHE6zA4uDQLQa-s0CtwUw49V53UL_A@mail.gmail.com>
	<CAMbWs4-Xru_eKBeRHFduigSGihdixFWVTR8A+dtMw7Mao+RkJA@mail.gmail.com>
	<CAMbWs49dLjSSQRWeud+KSN0G531ciZdYoLBd5qktXA+3JQm_UQ@mail.gmail.com>
	<CAMbWs48LXGC-Y63YtzEeM-3f0NUXWCUEMs7XwGzywXTjUNMcxQ@mail.gmail.com>
	<CAMbWs48XdzvnwfTHWxQ7qK-yjvdrbwsPpqhJBuKDnO+hcbsVwA@mail.gmail.com>
	<CA+TgmoaO-7RHdyJuizWChXZm7EJGvDcfoePDDEyUA-y8vTB1tg@mail.gmail.com>
	<CAMbWs4-+jXRpKuFMZa08bS34-TBka3qqjVMAUjF=-1RA9BKvgg@mail.gmail.com>
	<CA+TgmoZapU1y59-s3o8oPt7Hv+cxRh_34FMu6MXumomLe+U1Cw@mail.gmail.com>
	<CAMbWs4_sEeeBmucBzbamBMfA9uLxVmOc_MV=ZpSyDbTcrUO_XQ@mail.gmail.com>
	<CA+Tgmob4fnv57PQB0Oox86mHSJQ0vVL249eT=gqPvrMkG7h1zw@mail.gmail.com>
	<CAMbWs489NYyTcCTbrUi7hPXKtNY5vHrrFcHyMRAv=CA5WsszVw@mail.gmail.com>
	<CA+TgmoazmDdcc7NeTo3WM5HW3DASNP4rfZw6X+2nnQKHampOng@mail.gmail.com>
	<CAMbWs49bYr-ULhA+-At0iQ+NaFKy72AWB6jzughk8MPTiY+gMQ@mail.gmail.com>
	<CA+TgmoYa-zexdbc5nO_D6oxPMZYs06hkYwZK5Dufq+4Hhe6uNQ@mail.gmail.com>
	<CAMbWs4_aji0kME490phz6nTXnPToddUn19OF3rLm1g4TbNkuzQ@mail.gmail.com>
	<CA+Tgmoa3+G_=8XuQWN+0ugv6r-WV6ruFESpOxpXAAKrne3oVDQ@mail.gmail.com>
	<CAMbWs49qiox13EKb7bqgLu7Gu9oar+xe6KMwBjgFwod3JzPfUw@mail.gmail.com>

Sorry for the slow response.

On Fri, Jun 13, 2025 at 3:42 AM Richard Guo <[email protected]> wrote:
> The transformation of eager aggregation is:
>
>     GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
>     =
>     GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1)
> JOIN R2 ON J)
>
> This equivalence holds under the following conditions:
>
> 1) AGG is decomposable, meaning that it can be computed in two stages:
> a partial aggregation followed by a final aggregation;
> 2) The set G1 used in the pre-aggregation of R1 includes:
>     * all columns from R1 that are part of the grouping keys G, and
>     * all columns from R1 that appear in the join condition J.
> 3) The grouping operator for any column in G1 must be compatible with
> the operator used for that column in the join condition J.

This proof seems to ignore join-order constraints. I'm not sure to
what degree that influences the ultimate outcome here, but given A
LEFT JOIN (B INNER JOIN C), we cannot simply decide that A and C
comprise R1 and B comprises R2, because it is not actually possible to
do the A-C join first and treat the result as a relation to be joined
to B. That said, I do very much like the explicit enumeration of
criteria that must be met for the optimization to be valid. That makes
it a lot easier to evaluate whether the theory of the patch is
correct.

> To address these concerns, I'm thinking that maybe we can adopt a
> strategy where partial aggregation is only pushed to the lowest
> possible level in the join tree that is deemed useful.  In other
> words, if we can build a grouped path like "AGG(B) JOIN A" -- and
> AGG(B) yields a significant reduction in row count -- we skip
> exploring alternatives like "AGG(A JOIN B)".

I really like this idea. I believe we need some heuristic here and
this seems like a reasonable one. I think there could be a better one,
potentially. For instance, it would be reasonable (in my opinion) to
do some kind of evaluation of AGG(A JOIN B) vs. AGG(B) JOIN A that
does not involve performing full path generation for both cases; e.g.
one could try to decide considering only row counts, for instance.
However, I'm not saying that would work better than your proposal
here, or that it should be a requirement for this to be committed;
it's just an idea. IMHO, the requirement to have something committable
is that there is SOME heuristic limiting the search space and at the
same time the patch can still be demonstrated to give SOME benefit. I
think what you propose here meets those criteria. I also like the fact
that it's simple and easy to understand. If it does go wrong, it will
not be too difficult for someone to understand why it has gone wrong,
which is very desirable.

> I think this heuristic serves as a good starting point, and we can
> look into extending it with more advanced strategies as the feature
> evolves.

So IOW, +1 to what you say here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

view thread (75+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Eager aggregation, take 3
  In-Reply-To: <CA+Tgmob3xCPxNP7n91Tu3hg08i8GbyHY=Dd4CrsXxtcdT9E0Ww@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox