public inbox for [email protected]  
help / color / mirror / Atom feed
From: Richard Guo <[email protected]>
To: Paul George <[email protected]>
Cc: Andy Fan <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Cc: [email protected]
Subject: Re: Eager aggregation, take 3
Date: Wed, 10 Jul 2024 16:27:02 +0800
Message-ID: <CAMbWs4_EjgcBib5+y1LYcGB3EK3Y6R+OOxGKfJo42fDovadk1g@mail.gmail.com> (raw)
In-Reply-To: <CALA8mJquG_zCJXfVwash5LKqHGtZXQmq7RfTSaRDUzGYeW=7Rw@mail.gmail.com>
References: <CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com>
	<[email protected]>
	<CAMbWs49=eAd2W9jCtGhaZPPp+SOC_2rg16RTG74xAht=hkr5JQ@mail.gmail.com>
	<CAMbWs49Nc4M3H+eCf1+8w8piDyEECjRb-gK_JMF4VvcyWwGEVQ@mail.gmail.com>
	<CAMbWs49E_dR0nobsExsyetpnBpHObLTsQLsEbWKQLkh0omPxNg@mail.gmail.com>
	<CAMbWs49B_qUiHvu2EqLHZRpLr3p_+QPBs50n2=L5ibYzniwTzA@mail.gmail.com>
	<CAMbWs48KCQtDymnYi4M=Vz+WMzo3fkBxffJsyk6VX6hOXXv+VA@mail.gmail.com>
	<CAMbWs49sv_MuOYqqrtmBN_oYf8VSQ2BXDwXaTpJTn_YfwyYdWQ@mail.gmail.com>
	<CAMbWs49U8Sddx_fGszPdvA3jp_nheynxaqm5Y4NqMV21VBYAuQ@mail.gmail.com>
	<CAMbWs4-LwyOg9ga+NVF7yQbMi0ZsZdN1G_sO2v=YJHV18=19+A@mail.gmail.com>
	<CALA8mJquG_zCJXfVwash5LKqHGtZXQmq7RfTSaRDUzGYeW=7Rw@mail.gmail.com>

On Sun, Jul 7, 2024 at 10:45 AM Paul George <[email protected]> wrote:
> Thanks for reviving this patch and for all of your work on it! Eager aggregation pushdown will be beneficial for my work and I'm hoping to see it land.

Thanks for looking at this patch!

> The output of both the original query and this one match (and the plans with eager aggregation and the subquery are nearly identical if you restore the LEFT JOIN to a JOIN). I admittedly may be missing a subtlety, but does this mean that there are conditions under which eager aggregation can be pushed down to the nullable side?

I think it's a very risky thing to push a partial aggregation down to
the nullable side of an outer join, because the NULL-extended rows
produced by the outer join would not be available when we perform the
partial aggregation, while with a non-eager-aggregation plan these
rows are available for the top-level aggregation.  This may put the
rows into groups in a different way than expected, or get wrong values
from the aggregate functions.  I've managed to compose an example:

create table t (a int, b int);
insert into t select 1, 1;

select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by
t2.a having t2.a is null;
 a | count
---+-------
   |     1
(1 row)

This is the expected result, because after the outer join we have got
a NULL-extended row.

But if we somehow push down the partial aggregation to the nullable
side of this outer join, we would get a wrong result.

explain (costs off)
select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by
t2.a having t2.a is null;
                QUERY PLAN
-------------------------------------------
 Finalize HashAggregate
   Group Key: t2.a
   ->  Nested Loop Left Join
         Filter: (t2.a IS NULL)
         ->  Seq Scan on t t1
         ->  Materialize
               ->  Partial HashAggregate
                     Group Key: t2.a
                     ->  Seq Scan on t t2
                           Filter: (b > 1)
(10 rows)

select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by
t2.a having t2.a is null;
 a | count
---+-------
   |     0
(1 row)

I believe there are cases where pushing a partial aggregation down to
the nullable side of an outer join can be safe, but I doubt that there
is an easy way to identify these cases and do the push-down for them.
So for now I think we'd better refrain from doing that.

Thanks
Richard






view thread (30+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Eager aggregation, take 3
  In-Reply-To: <CAMbWs4_EjgcBib5+y1LYcGB3EK3Y6R+OOxGKfJo42fDovadk1g@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox