MIME-Version: 1.0
References: <CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com>
 <87il22cj51.fsf@163.com> <CAMbWs49=eAd2W9jCtGhaZPPp+SOC_2rg16RTG74xAht=hkr5JQ@mail.gmail.com>
 <CAMbWs49Nc4M3H+eCf1+8w8piDyEECjRb-gK_JMF4VvcyWwGEVQ@mail.gmail.com>
 <CAMbWs49E_dR0nobsExsyetpnBpHObLTsQLsEbWKQLkh0omPxNg@mail.gmail.com>
 <CAMbWs49B_qUiHvu2EqLHZRpLr3p_+QPBs50n2=L5ibYzniwTzA@mail.gmail.com>
 <CAMbWs48KCQtDymnYi4M=Vz+WMzo3fkBxffJsyk6VX6hOXXv+VA@mail.gmail.com>
 <CAMbWs49sv_MuOYqqrtmBN_oYf8VSQ2BXDwXaTpJTn_YfwyYdWQ@mail.gmail.com>
 <CAMbWs49U8Sddx_fGszPdvA3jp_nheynxaqm5Y4NqMV21VBYAuQ@mail.gmail.com>
 <CAMbWs4-LwyOg9ga+NVF7yQbMi0ZsZdN1G_sO2v=YJHV18=19+A@mail.gmail.com>
 <CALA8mJquG_zCJXfVwash5LKqHGtZXQmq7RfTSaRDUzGYeW=7Rw@mail.gmail.com>
 <CAMbWs4_EjgcBib5+y1LYcGB3EK3Y6R+OOxGKfJo42fDovadk1g@mail.gmail.com>
 <CALA8mJqe0anNM8_V6cOeOQnCHUTQggn7iOQNyQr1VaN_xMjz+w@mail.gmail.com>
 <CAMbWs48eE-s-jCicC8pSVfXk8Ws-ZvUKnsw8qH-DkVBdYv0eJQ@mail.gmail.com>
 <CAMbWs483a7-8M0pDttG44r-+8Gevn9VG0xNceE3WpkEQxJXPZw@mail.gmail.com>
 <CAHewXNmYM6DvR_kaxDL0w0fz9BwKbac+TSU3QS10aA3cXHyMmA@mail.gmail.com>
 <CA+TgmoaxH=P63hLYgyJJcEbMRnw3xi16d=HxFi1j-m7MhH6W_w@mail.gmail.com>
 <CAMbWs4_cOnpGsywj9Jt1WAgzJLW9Rxt5X13cfGz4iN2qvZQ68g@mail.gmail.com>
 <CA+Tgmob0q7bRbsFTVDMjxHE6zA4uDQLQa-s0CtwUw49V53UL_A@mail.gmail.com>
 <CAMbWs4-Xru_eKBeRHFduigSGihdixFWVTR8A+dtMw7Mao+RkJA@mail.gmail.com>
 <CAMbWs49dLjSSQRWeud+KSN0G531ciZdYoLBd5qktXA+3JQm_UQ@mail.gmail.com>
 <CAMbWs48LXGC-Y63YtzEeM-3f0NUXWCUEMs7XwGzywXTjUNMcxQ@mail.gmail.com>
 <CAMbWs48XdzvnwfTHWxQ7qK-yjvdrbwsPpqhJBuKDnO+hcbsVwA@mail.gmail.com>
 <CA+TgmoaO-7RHdyJuizWChXZm7EJGvDcfoePDDEyUA-y8vTB1tg@mail.gmail.com>
 <CAMbWs4-+jXRpKuFMZa08bS34-TBka3qqjVMAUjF=-1RA9BKvgg@mail.gmail.com> <CA+TgmoZapU1y59-s3o8oPt7Hv+cxRh_34FMu6MXumomLe+U1Cw@mail.gmail.com>
In-Reply-To: <CA+TgmoZapU1y59-s3o8oPt7Hv+cxRh_34FMu6MXumomLe+U1Cw@mail.gmail.com>
From: Richard Guo <guofenglinux@gmail.com>
Date: Thu, 16 Jan 2025 17:18:18 +0900
Message-ID: <CAMbWs4_sEeeBmucBzbamBMfA9uLxVmOc_MV=ZpSyDbTcrUO_XQ@mail.gmail.com>
Subject: Re: Eager aggregation, take 3
To: Robert Haas <robertmhaas@gmail.com>
Cc: Tom Lane <tgl@sss.pgh.pa.us>, Tender Wang <tndrwang@gmail.com>, 
	Paul George <p.a.george19@gmail.com>, Andy Fan <zhihuifan1213@163.com>, 
	PostgreSQL-development <pgsql-hackers@postgresql.org>, pgsql-hackers@lists.postgresql.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://www.postgresql.org/message-id/CAMbWs4_sEeeBmucBzbamBMfA9uLxVmOc_MV%3DZpSyDbTcrUO_XQ%40mail.gmail.com>
Precedence: bulk

On Wed, Jan 15, 2025 at 11:40=E2=80=AFPM Robert Haas <robertmhaas@gmail.com=
> wrote:
> On Wed, Jan 15, 2025 at 1:58=E2=80=AFAM Richard Guo <guofenglinux@gmail.c=
om> wrote:
> > I understand that we're currently quite bad at estimating the number
> > of groups after aggregation.  In fact, it's not just aggregation
> > estimates =E2=80=94 we're also bad at join estimates in some cases.  Th=
is is a
> > reality we have to face.  Here's what I think: we should be trying our
> > best to cost each node type as accurately as possible, and then build
> > the upper nodes based on those costs.  We should not conclude that,
> > because we are unable to accurately cost one node type, we should
> > avoid any cost-based optimizations above that node.
>
> Well, I agree with that last sentence, for sure. But I don't think
> it's true that the situations with joins and aggregates are
> comparable. We are much better able to estimate the number of rows
> that will come out of a join than we are to estimate the number of
> rows that come out of an aggregate. It's certainly true that in some
> cases we get join estimates badly wrong, and I'd like to see us do
> better there, but our estimates of the number of distinct values that
> exist in a column are the least reliable part of our statistics system
> by far.

I totally understand that the situation with joins is better than with
aggregates, which is why I said that we're also bad at join estimates
"in some cases" - especially in the cases where we fall back to use
default selectivity estimates.  A simple example:

create table t1 (a int, b int);
create table t2 (a int, b int);

insert into t1 select i, i from generate_series(1,1000)i;
insert into t2 select i, i from generate_series(1000, 1999)i;

analyze t1, t2;

explain analyze select * from t1 join t2 on t1.a > t2.a;

And here is what I got:

 Nested Loop  (cost=3D0.00..15032.50 rows=3D333333 width=3D16)
              (actual time=3D392.841..392.854 rows=3D0 loops=3D1)

If this t1/t2 join is part of a larger SELECT query, I think the cost
estimates for the upper join nodes would likely be quite inaccurate.

> > I believe the HashAggregate node in this plan faces the same problem
> > with inaccurate estimates.  However, I don't think it's reasonable to
> > say that, because we cannot accurately cost the Aggregate node, we
> > should disregard considering JOIN_UNIQUE_OUTER/INNER.
>
> Fair point.
>
> > Back in August, I responded to this issue by "Maybe we can run some
> > benchmarks first and investigate the regressions discovered on a
> > case-by-case basis".  In October, I ran the TPC-DS benchmark at scale
> > 10 and observed that eager aggregation was applied in 7 queries, with
> > no notable regressions.  In contrast, Q4 and Q11 showed performance
> > improvements of 3=E2=80=934 times.  Please see [1].
>
> I had forgotten about that, and again, fair point, but I'm concerned
> that it might not be a broad enough base of queries to test against.
> (7 isn't a very large number.)

Yeah, I know 7 is not a large number, but this is the result I got
from running the TPC-DS benchmark.  For the remaining 92 queries in
the benchmark, either the logic in this patch determines that eager
aggregation is not applicable, or the path with eager aggregation is
not the optimal one.  I'd be more than happy if a benchmark query
showed significant performance regression, so it would provide an
opportunity to investigate how the cost estimates are negatively
impacting the final plan and explore ways to avoid or improve that.
If anyone can provide such a benchmark query, I'd be very grateful.

Perhaps this is another reason why we should enable this feature by
default, so we can identify such regression issues sooner rather than
later.

> > Actually, I introduced the EAGER_AGGREGATE_RATIO mechanism in October
> > to limit the planning effort for eager aggregation.  For more details,
> > please see [2].
>
> OK, I missed this, but...
>
> > And I don't think it's correct to say that we create a partially
> > grouped rel for every baserel and every joinrel.  This patch includes
> > a bunch of logic to determine whether it's appropriate to create a
> > grouped rel for a base or join rel.  Furthermore, with the
> > EAGER_AGGREGATE_RATIO mechanism, even if creating a grouped rel is
> > possible, we will skip it if the grouped paths are considered not
> > useful.  All of these measures help reduce the number of grouped
> > paths as well as the grouped relations in many cases where eager
> > aggregation would not help a lot.
>
> ...it looks to me like EAGER_AGGREGATE_RATIO is used to set the
> RelAggInfo's agg_useful field, which seems like it happens after the
> RelOptInfo has already been created. I had been looking for something
> that would avoid creating the RelOptInfo in the first place and I
> didn't see it.

Well, from the perspective of planning effort, what really matters is
whether the RelOptInfo for the grouped relation is added to the
PlannerInfo, as it is only then available for further joining in the
join search routine, not whether the RelOptInfo is built or not.
Building the RelOptInfo for a grouped relation is simply a makeNode
call followed by a flat copy; it doesn't require going through the
full process of determining its target list, or constructing its
restrict and join clauses, or calculating size estimates, etc.

Now, let's take a look at how the EAGER_AGGREGATE_RATIO mechanism is
used.  As you mentioned, EAGER_AGGREGATE_RATIO is used to set the
agg_useful field of the RelAggInfo.  For a base rel where we've
decided that aggregation can be pushed down, if agg_useful is false,
we skip building the grouped relation for it in the first place, not
to mention adding the grouped relation to the PlannerInfo.  For a join
rel where aggregation can be pushed down, if agg_useful is false, we
will create a temporary RelOptInfo for its grouped relation, but we
only add this RelOptInfo to the PlannerInfo if we can generate any
grouped paths by joining its input relations.  We could easily modify
make_grouped_join_rel() to create this temporary RelOptInfo only when
needed, but I'm not sure if that's necessary, since I don't have data
to suggest that the creation of this temporary RelOptInfo is a factor
in causing planning regressions.

> > Based on the TPC-DS benchmark results, I don't see "a lot of overhead"
> > in the planning cost, at least for the 7 queries where eager
> > aggregation is applied.  As I said in [1], "For the planning time, I
> > do not see notable regressions for any of the seven queries".  In
> > fact, I initially thought that we might consider enabling this by
> > default, given the positive benchmark results, but I just couldn't
> > summon the courage to do it.  Perhaps we should reconsider enabling it
> > by default, so users can benefit from the new feature and help
> > identify any potential bugs.
>
> If you're going to commit this, I think it would be a good idea to
> enable it by default at least for now. If there are problems, it's
> better to find out about them sooner rather than later. If they are
> minor they can be fixed; if they are major, we can consider whether it
> is better to fix them, disable the feature by default, or revert. We
> can add an open item to reconsider the default setting during beta.

Agreed.  And I like the suggestion of adding an open item about the
default setting during beta.

> > > As I said back in October, this seems like mixing together in one
> > > RelOptInfo paths that really belong to two different RelOptInfos.
> >
> > I understand that you said about the design in October where
> > "PartialAgg(t1 JOIN t2) and t1 JOIN PartialAgg(t2) get separate
> > RelOptInfos", because "it's less clear whether it's fair to compare
> > across the two categories".  I've shared my thoughts on this in [5].
> >
> > Furthermore, even if we separate these grouped paths into two
> > different RelOptInfos, we still face the issue that "different paths
> > of a grouped relation can have very different row counts", and we need
> > a way to handle this.  One could argue that we can separate the
> > grouped paths where partial aggregation is placed at different
> > locations into different RelOptInfos, but this would lead to an
> > explosion in the number of RelOptInfos for grouped relations as we
> > climb up the join tree.  I think this is neither realistic nor
> > necessary.
>
> It's possible you're right, but it does make me nervous. I do agree
> that making the number of RelOptInfos explode would be really bad.

Based on my explanation in [1], I think it's acceptable to compare
grouped paths for the same grouped rel, regardless of where the
partial aggregation is placed.

I fully understand that I could be wrong about this, but I don't think
it would break anything in regular planning (i.e., planning without
eager aggregation).  We would never compare a grouped path with a
non-grouped path during scan/join planning.  As far as I can see, the
only consequence in that case would be that we might fail to select
the optimal grouped path and miss out on fully leveraging the benefits
of eager aggregation.

Back in November, I considered the possibility of introducing a
GroupPathInfo into the Path structure to store the location of the
partial aggregation as well as the estimated rowcount for this grouped
path, similar to how ParamPathInfo functions for parameterized paths.
However, after some exploration, I determined that this was
unnecessary.

But in any case, I don't think it's an option to separate the grouped
paths of the same grouped relation into different RelOptInfos based on
the location of the partial aggregation within the path tree.

[1] https://postgr.es/m/CAMbWs49dLjSSQRWeud+KSN0G531ciZdYoLBd5qktXA+3JQm_UQ=
@mail.gmail.com

Thanks
Richard