MIME-Version: 1.0
References: <CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com>
 <87il22cj51.fsf@163.com> <CAMbWs49=eAd2W9jCtGhaZPPp+SOC_2rg16RTG74xAht=hkr5JQ@mail.gmail.com>
 <CAMbWs49Nc4M3H+eCf1+8w8piDyEECjRb-gK_JMF4VvcyWwGEVQ@mail.gmail.com>
 <CAMbWs49E_dR0nobsExsyetpnBpHObLTsQLsEbWKQLkh0omPxNg@mail.gmail.com>
 <CAMbWs49B_qUiHvu2EqLHZRpLr3p_+QPBs50n2=L5ibYzniwTzA@mail.gmail.com>
 <CAMbWs48KCQtDymnYi4M=Vz+WMzo3fkBxffJsyk6VX6hOXXv+VA@mail.gmail.com>
 <CAMbWs49sv_MuOYqqrtmBN_oYf8VSQ2BXDwXaTpJTn_YfwyYdWQ@mail.gmail.com>
 <CAMbWs49U8Sddx_fGszPdvA3jp_nheynxaqm5Y4NqMV21VBYAuQ@mail.gmail.com>
 <CAMbWs4-LwyOg9ga+NVF7yQbMi0ZsZdN1G_sO2v=YJHV18=19+A@mail.gmail.com>
 <CALA8mJquG_zCJXfVwash5LKqHGtZXQmq7RfTSaRDUzGYeW=7Rw@mail.gmail.com>
 <CAMbWs4_EjgcBib5+y1LYcGB3EK3Y6R+OOxGKfJo42fDovadk1g@mail.gmail.com>
 <CALA8mJqe0anNM8_V6cOeOQnCHUTQggn7iOQNyQr1VaN_xMjz+w@mail.gmail.com>
 <CAMbWs48eE-s-jCicC8pSVfXk8Ws-ZvUKnsw8qH-DkVBdYv0eJQ@mail.gmail.com>
 <CAMbWs483a7-8M0pDttG44r-+8Gevn9VG0xNceE3WpkEQxJXPZw@mail.gmail.com>
 <CAHewXNmYM6DvR_kaxDL0w0fz9BwKbac+TSU3QS10aA3cXHyMmA@mail.gmail.com>
 <CA+TgmoaxH=P63hLYgyJJcEbMRnw3xi16d=HxFi1j-m7MhH6W_w@mail.gmail.com>
 <CAMbWs4_cOnpGsywj9Jt1WAgzJLW9Rxt5X13cfGz4iN2qvZQ68g@mail.gmail.com> <CA+Tgmob0q7bRbsFTVDMjxHE6zA4uDQLQa-s0CtwUw49V53UL_A@mail.gmail.com>
In-Reply-To: <CA+Tgmob0q7bRbsFTVDMjxHE6zA4uDQLQa-s0CtwUw49V53UL_A@mail.gmail.com>
From: Richard Guo <guofenglinux@gmail.com>
Date: Fri, 1 Nov 2024 14:54:28 +0900
Message-ID: <CAMbWs4-Xru_eKBeRHFduigSGihdixFWVTR8A+dtMw7Mao+RkJA@mail.gmail.com>
Subject: Re: Eager aggregation, take 3
To: Robert Haas <robertmhaas@gmail.com>
Cc: Tender Wang <tndrwang@gmail.com>, Paul George <p.a.george19@gmail.com>, 
	Andy Fan <zhihuifan1213@163.com>, 
	PostgreSQL-development <pgsql-hackers@postgresql.org>, pgsql-hackers@lists.postgresql.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://www.postgresql.org/message-id/CAMbWs4-Xru_eKBeRHFduigSGihdixFWVTR8A%2BdtMw7Mao%2BRkJA%40mail.gmail.com>
Precedence: bulk

On Wed, Oct 30, 2024 at 5:06=E2=80=AFAM Robert Haas <robertmhaas@gmail.com>=
 wrote:
> On Tue, Sep 24, 2024 at 11:20=E2=80=AFPM Richard Guo <guofenglinux@gmail.=
com> wrote:
> > The reason is that it is very tricky to set the size estimates for a
> > grouped join relation.  For a non-grouped join relation, we know that
> > all its paths have the same rowcount estimate (well, in theory).  But
> > this is not true for a grouped join relation.  Suppose we have a
> > grouped join relation for t1/t2 join.  There might be two paths for
> > it:
>
> What exactly do you mean by "well, in theory" here? My understanding
> of how things work today is that every relation is supposed to produce
> a specific set of rows and every unparameterized path must produce
> that set of rows. The order of the rows may vary but the set of rows
> may not. With your proposed design here, that's no longer true.
> Instead, what's promised is that the row sets will become equivalent
> after a later FinalizeAggregate step. In a sense, this is like
> parameterization or partial paths.

Yeah, you're correct that currently each relation is expected to
produce the same specific set of rows.  When I say "well, in theory" I
mean that for a join relation, all its unparameterized paths should
theoretically have the same row count estimate.  However, in practice,
because there are more than one way to make a joinrel for more than
two base relations, and the selectivity estimation routines don=E2=80=99t
handle all cases equally well, we might get different row count
estimates depending on the pair provided.

And yes, with the grouped relations proposed in this patch, the
situation changes.  For a grouped join relation, different paths can
produce very different row sets, depending on where the partial
aggregation is placed in the path tree.  This is also why we need to
recalculate the row count estimate for a grouped join path using its
outer and inner paths provided, rather than using path->parent->rows
directly.  This is very like the parameterized path case.

> I think what you're doing here is roughly equivalent to either of
> these two cases. It's more like the parameterized path case. Instead
> of having a path for a relation which is parameterized by some input
> parameter, you have a path for a relation, say bar, which is partially
> aggregated by some grouping column. But there's no guarantee of how
> much partial aggregation has been done. In your example, PartialAgg(t1
> JOIN t2) is "more aggregated" than t1 JOIN PartialAgg(t2), so the row
> counts are different. This makes me quite nervous. You can't compare a
> parameterized path to an unparameterized path, but you can compare it
> to another parameterized path if the parameterizations are the same.
> You can't compare a partial path to a non-partial path, but you can
> compare partial paths to each other. But with this work,
> unparameterized, non-partial paths in the same RelOptInfo don't seem
> like they are truly comparable. Maybe that's OK, but I'm not sure that
> it isn't going to break other things.

Perhaps we could introduce a GroupPathInfo into the Path structure to
store common information for a grouped path, such as the location of
the partial aggregation (which could be the relids of the relation on
top of which we place the partial aggregation) and the estimated
rowcount for this grouped path, similar to how ParamPathInfo functions
for parameterized paths.  Then we should be able to compare the
grouped paths of the same location apples to apples.  I haven=E2=80=99t
thought this through in detail yet, though.

> It's tempting to wonder if there's some way that we can avoid
> generating paths for both PartialAgg(t1 JOIN t2) and t1 JOIN
> PartialAgg(t2). Either the former has lower cardinality, or the latter
> does. It seems likely that the lower-cardinality set is the winning
> strategy. Even if the path has higher cost to generate, we save work
> at every subsequent join level and at the final aggregation step. Are
> there counterexamples where it's better to  use a path from the
> higher-cardinality set?

This is very appealing if we can retain only the lower-cardinality
path, as it would greatly simplify the overall design.  I haven't
looked for counterexamples yet, but I plan to do so later.

> By the way, the work of figuring out what target list should be
> produced by partial grouping is done by init_grouping_targets(), but
> the comments seem to take it for granted that I know what result we're
> trying to produce, and I don't. I think some more high-level
> explanation of the goals of this code would be useful. It seems to me
> that if I'm looking at a path for an ungrouped relation and it
> produces a certain target list, then every column of that target list
> is needed somewhere. If those columns are group keys, cool: we pass
> those through. If they're inputs to the aggregates, cool: we feed them
> to the aggregates. But if they are neither, then what? In the patch,
> you either group on those columns or add them to the
> possibly_dependent list depending on the result of
> is_var_needed_by_join(). I can believe that there are some cases where
> we can group on such columns and others where we can't, but find it
> difficult to believe that this test reliably distinguishes between
> those two cases. If it does, I don't understand why it does. Don't I
> need to know something about how those columns are used in the upper
> joins? Like, if those columns are connected by a chain of
> binary-equality operators back to the user's choice of grouping
> columns, that sounds good, but this test doesn't distinguish between
> that case and an upper join on the < operator.
> create_grouping_expr_infos() does reason based on whether there's an
> equal-image operator available, but AIUI that's only reasoning about
> the group columns the user mentioned, not the sort of implicit
> grouping columns that init_grouping_targets() is creating.

Yeah, this patch does not get it correct here.  Basically the logic is
that for the partial aggregation pushed down to a non-aggregated
relation, we need to consider all columns of that relation involved in
upper join clauses and include them in the grouping keys.  Currently,
the patch only checks whether a column is involved in upper join
clauses but does not verify how the column is used.  We need to ensure
that the operator used in the join clause is at least compatible with
the grouping operator; otherwise, the grouping operator might
interpret the values as the same while the join operator sees them as
different.

Thanks
Richard