public inbox for [email protected]  
help / color / mirror / Atom feed
From: Richard Guo <[email protected]>
To: Robert Haas <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: Tender Wang <[email protected]>
Cc: Paul George <[email protected]>
Cc: Andy Fan <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Cc: [email protected]
Cc: Matheus Alcantara <[email protected]>
Subject: Re: Eager aggregation, take 3
Date: Sat, 13 Sep 2025 17:27:41 +0900
Message-ID: <CAMbWs484dnecwXT2WzWFvzEmWPzC3U9F8SRDXg-SEegTYUFyXQ@mail.gmail.com> (raw)
In-Reply-To: <CA+TgmoaY6E5-UFTWp5BtAjBO=tDQd=UVAgeJ3dRbFFzhnP5NHg@mail.gmail.com>
References: <CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com>
	<[email protected]>
	<CAMbWs49=eAd2W9jCtGhaZPPp+SOC_2rg16RTG74xAht=hkr5JQ@mail.gmail.com>
	<CAMbWs49Nc4M3H+eCf1+8w8piDyEECjRb-gK_JMF4VvcyWwGEVQ@mail.gmail.com>
	<CAMbWs49E_dR0nobsExsyetpnBpHObLTsQLsEbWKQLkh0omPxNg@mail.gmail.com>
	<CAMbWs49B_qUiHvu2EqLHZRpLr3p_+QPBs50n2=L5ibYzniwTzA@mail.gmail.com>
	<CAMbWs48KCQtDymnYi4M=Vz+WMzo3fkBxffJsyk6VX6hOXXv+VA@mail.gmail.com>
	<CAMbWs49sv_MuOYqqrtmBN_oYf8VSQ2BXDwXaTpJTn_YfwyYdWQ@mail.gmail.com>
	<CAMbWs49U8Sddx_fGszPdvA3jp_nheynxaqm5Y4NqMV21VBYAuQ@mail.gmail.com>
	<CAMbWs4-LwyOg9ga+NVF7yQbMi0ZsZdN1G_sO2v=YJHV18=19+A@mail.gmail.com>
	<CALA8mJquG_zCJXfVwash5LKqHGtZXQmq7RfTSaRDUzGYeW=7Rw@mail.gmail.com>
	<CAMbWs4_EjgcBib5+y1LYcGB3EK3Y6R+OOxGKfJo42fDovadk1g@mail.gmail.com>
	<CALA8mJqe0anNM8_V6cOeOQnCHUTQggn7iOQNyQr1VaN_xMjz+w@mail.gmail.com>
	<CAMbWs48eE-s-jCicC8pSVfXk8Ws-ZvUKnsw8qH-DkVBdYv0eJQ@mail.gmail.com>
	<CAMbWs483a7-8M0pDttG44r-+8Gevn9VG0xNceE3WpkEQxJXPZw@mail.gmail.com>
	<CAHewXNmYM6DvR_kaxDL0w0fz9BwKbac+TSU3QS10aA3cXHyMmA@mail.gmail.com>
	<CA+TgmoaxH=P63hLYgyJJcEbMRnw3xi16d=HxFi1j-m7MhH6W_w@mail.gmail.com>
	<CAMbWs4_cOnpGsywj9Jt1WAgzJLW9Rxt5X13cfGz4iN2qvZQ68g@mail.gmail.com>
	<CA+Tgmob0q7bRbsFTVDMjxHE6zA4uDQLQa-s0CtwUw49V53UL_A@mail.gmail.com>
	<CAMbWs4-Xru_eKBeRHFduigSGihdixFWVTR8A+dtMw7Mao+RkJA@mail.gmail.com>
	<CAMbWs49dLjSSQRWeud+KSN0G531ciZdYoLBd5qktXA+3JQm_UQ@mail.gmail.com>
	<CAMbWs48LXGC-Y63YtzEeM-3f0NUXWCUEMs7XwGzywXTjUNMcxQ@mail.gmail.com>
	<CAMbWs48XdzvnwfTHWxQ7qK-yjvdrbwsPpqhJBuKDnO+hcbsVwA@mail.gmail.com>
	<CA+TgmoaO-7RHdyJuizWChXZm7EJGvDcfoePDDEyUA-y8vTB1tg@mail.gmail.com>
	<CAMbWs4-+jXRpKuFMZa08bS34-TBka3qqjVMAUjF=-1RA9BKvgg@mail.gmail.com>
	<CA+TgmoZapU1y59-s3o8oPt7Hv+cxRh_34FMu6MXumomLe+U1Cw@mail.gmail.com>
	<CAMbWs4_sEeeBmucBzbamBMfA9uLxVmOc_MV=ZpSyDbTcrUO_XQ@mail.gmail.com>
	<CA+Tgmob4fnv57PQB0Oox86mHSJQ0vVL249eT=gqPvrMkG7h1zw@mail.gmail.com>
	<CAMbWs489NYyTcCTbrUi7hPXKtNY5vHrrFcHyMRAv=CA5WsszVw@mail.gmail.com>
	<CA+TgmoazmDdcc7NeTo3WM5HW3DASNP4rfZw6X+2nnQKHampOng@mail.gmail.com>
	<CAMbWs49bYr-ULhA+-At0iQ+NaFKy72AWB6jzughk8MPTiY+gMQ@mail.gmail.com>
	<CA+TgmoYa-zexdbc5nO_D6oxPMZYs06hkYwZK5Dufq+4Hhe6uNQ@mail.gmail.com>
	<CAMbWs4_aji0kME490phz6nTXnPToddUn19OF3rLm1g4TbNkuzQ@mail.gmail.com>
	<CA+Tgmoa3+G_=8XuQWN+0ugv6r-WV6ruFESpOxpXAAKrne3oVDQ@mail.gmail.com>
	<CAMbWs49qiox13EKb7bqgLu7Gu9oar+xe6KMwBjgFwod3JzPfUw@mail.gmail.com>
	<CAMbWs48F8WGA-Lzj1Dk76mFqRFxPEwG2_9Zb7+pFs8oi6ew2pw@mail.gmail.com>
	<CAMbWs484ms=WRZamOyWnVditREKFqipLsdaQjcv2uKur8SZuqw@mail.gmail.com>
	<CAMbWs49bL2ZMSc0W4G8=R7bjaa-vO6grucEOFYLZFUZE7+nzrQ@mail.gmail.com>
	<CA+TgmobqbeJ9iRQO4ym6OiHt71sSv2eai=01kOZjxhdof9K4Mw@mail.gmail.com>
	<CAMbWs4_2BzuAX+BSO1p7rtUwmQjORrG-b906Cw-RkfRjFP0oSQ@mail.gmail.com>
	<CA+TgmoYbkvYwLa+1vOP7RDY7kO2=A7rppoPusoRXe44VDOGBPg@mail.gmail.com>
	<CAMbWs4_aezTYOZSj7v+aypLo0dnjAierJtdx2gf6se28p88WHg@mail.gmail.com>
	<CA+TgmoaY6E5-UFTWp5BtAjBO=tDQd=UVAgeJ3dRbFFzhnP5NHg@mail.gmail.com>

On Sat, Sep 13, 2025 at 3:48 AM Robert Haas <[email protected]> wrote:
> On Fri, Sep 12, 2025 at 5:34 AM Richard Guo <[email protected]> wrote:
> > I really like this idea.  Currently, aggtransspace represents an
> > estimate of the transition state size provided by the aggregate
> > definition.  If it's set to zero, a default estimate based on the
> > state data type is used.  Negative values currently have no defined
> > meaning.  I think it makes perfect sense to reuse this field so that
> > a negative value indicates that the transition state data can grow
> > unboundedly in size.
> >
> > Attached 0002 implements this idea.  It requires fewer code changes
> > than I expected.  This is mainly because that our current code uses
> > aggtransspace in such a way that if it's a positive value, that value
> > is used as it's provided by the aggregate definition; otherwise, some
> > heuristics are applied to estimate the size.  For the aggregates that
> > accumulate input rows (e.g., array_agg, string_agg), I don't currently
> > have a better heuristic for estimating their size, so I've chosen to
> > keep the current logic.  This won't regress anything in estimating
> > transition state data size.

> This might be OK, but it's not what I was suggesting: I was suggesting
> trying to do a calculation like space_used = -aggtransspace *
> rowcount, not just using a <0 value as a sentinel.

I've considered your suggestion, but I'm not sure I'll adopt it in the
end.  Here's why:

1) At the point where we check whether any aggregates might pose a
risk of excessive memory usage during partial aggregation, row count
information is not yet available.  You could argue that we could
reorganize the logic to perform this check after we've had the row
count, but that seems quite tricky.  If I understand correctly, the
"rowcount" in this context actually means the number of rows within
one partial group.  That would require us to first decide on the
grouping expressions for the partial aggregation, then compute the
group row counts, then estimate space usage, and only then decide
whether memory usage is excessive and fall back.  This would come
quite late in planning and adds nontrivial overhead, compared to the
current approach which checks at the very beginning.

2) Even if we were able to estimate space usage based on the number of
rows per partial group and determined that memory usage seems
acceptable, we still couldn't guarantee that the transition state data
won't grow excessively after further joins.  Joins can multiply
partial aggregates, potentially causing a blowup in memory usage even
if the initial estimate seemed safe.

3) I don't think "-aggtransspace * rowcount" reflects the true memory
footprint for aggregates that accumulate input rows.  For example,
what if we have an aggregate like string_agg(somecolumn, 'a very long
delimiter')?

4) AFAICS, the main downside of the current approach compared to yours
is that it avoids pushing down aggregates like string_agg() that
accumulate input rows, whereas your suggestion might allow pushing
them down in some cases where we *think* it wouldn't blow up memory.
You might argue that the current implementation is over-conservative.
But I prefer to start safe.

That said, I appreciate you proposing the idea of reusing
aggtransspace, although I ended up using it in a different way than
you suggested.

- Richard





view thread (75+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Eager aggregation, take 3
  In-Reply-To: <CAMbWs484dnecwXT2WzWFvzEmWPzC3U9F8SRDXg-SEegTYUFyXQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox