Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sQHuS-008uwi-GK for pgsql-hackers@arkaria.postgresql.org; Sun, 07 Jul 2024 02:45:52 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sQHuP-008Js7-0I for pgsql-hackers@arkaria.postgresql.org; Sun, 07 Jul 2024 02:45:49 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sQHuO-008Jry-IN for pgsql-hackers@lists.postgresql.org; Sun, 07 Jul 2024 02:45:48 +0000 Received: from mail-ej1-x633.google.com ([2a00:1450:4864:20::633]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sQHuM-000omI-6h for pgsql-hackers@postgresql.org; Sun, 07 Jul 2024 02:45:47 +0000 Received: by mail-ej1-x633.google.com with SMTP id a640c23a62f3a-a77d876273dso100683866b.0 for ; Sat, 06 Jul 2024 19:45:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720320344; x=1720925144; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=67HUUePJJdgLlxaXABhM7oYUsw24O4Z3NTesI5yF2aE=; b=AhrXC7oJwnaQTa5rQxzzrUj0WWlB0paFpkGOS7vjHSje/5JnhndGLBc12nXRSqmuwI oSPNxmBqGw1UcWo3DYNLC77oNjjxHJOXyjt31OnjGuLLjqCEni2LSsprr5FO3hKnfkrY go64F2RaTAVyhWucsiZzzhxS5nZ8r6q5TNsCjnnfY9GKZnrhWg5P45yLtUilf25oT8iT bqjtKKbYYSAVEF8vKq7ogejvaM/KkXROJibEDG+wzvO6cbcDHWZcX39WBWgF9zOOda5+ XC1EWUGlcvY6+zYcS/WElQBNu89KFQyBxjnVufw5g6sKb/n+7+Kkg38TR4h4lC3Wi3ga Tbfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720320344; x=1720925144; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=67HUUePJJdgLlxaXABhM7oYUsw24O4Z3NTesI5yF2aE=; b=IYgGr45xbNu4mqEpOiCcodT9fiaSr2a/lUWVZyHA7/i3MPCNDMuwdmwAfkKg5FbJV7 JvkprYNmsljBmpHWDte/YxxtRgH0iD0pS/zw+7xgy5Sc7mW8BdpoLClNcf+ZvHu1jq7t lgqAM/hwC2KrXQUlJOkAWjBX3twuHduJVoPlo6nNnSlDQ6JH9f8cXP4y1IoaazIEnZ9r MoVIyzOXubfU2RWNmAOJrludXuho1aIu61PyaccsOc7JYUV9m2/7V0iZR+fvc2as6Z5K m3neeqmyY72tCnjxGbZ6w/rjja9bXTe41qMjZ5rLkeESavaqw51ZNVPKHMhHLE2RXxAA cGGg== X-Forwarded-Encrypted: i=1; AJvYcCVLGGF+l5O9KFDKYuWhI0SznjAEq2mGmSk4XbAXLjexAaJ7ePPKTpers091gcEMDAKxKDEkuqLJC6/z7n8Akljr0dea6y9s67AK0n1Z X-Gm-Message-State: AOJu0Ywr3UfR+0XzWVdvCb+WMlwTmLCOopCMB0lRAv+Ir2VZLe7gnQ3J fTSQOjJkcQaWB6rg6M5+2SG3O2Ei5zuT/paixHXBd8CqZGZo9GxYqrVkMTbXv5/purijLvVl5sD bQnJzJX/apf3C4DeeDViLE6Ste4I= X-Google-Smtp-Source: AGHT+IHQaZBUpMZWHIqypFpQSBs8wD5SivOPlGLkbLJOEfArsJ0S/X7bgw9NcEu1UZuasFuxFpiHYRVlaiV/PPwMzro= X-Received: by 2002:a05:6402:42c6:b0:58a:fa6e:9dae with SMTP id 4fb4d7f45d1cf-58e59f2a54emr8256293a12.15.1720320343708; Sat, 06 Jul 2024 19:45:43 -0700 (PDT) MIME-Version: 1.0 References: <87il22cj51.fsf@163.com> In-Reply-To: From: Paul George Date: Sat, 6 Jul 2024 19:45:32 -0700 Message-ID: Subject: Re: Eager aggregation, take 3 To: Richard Guo Cc: Andy Fan , PostgreSQL-development , pgsql-hackers@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000fc99f6061c9f4a95" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000fc99f6061c9f4a95 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Richard: Thanks for reviving this patch and for all of your work on it! Eager aggregation pushdown will be beneficial for my work and I'm hoping to see it land. I was playing around with v9 of the patches and was specifically curious about this previous statement... >This patch also makes eager aggregation work with outer joins. With >outer join, the aggregate cannot be pushed down if any column referenced >by grouping expressions or aggregate functions is nullable by an outer >join above the relation to which we want to apply the partiall >aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we >can easily identify such situations and subsequently refrain from >pushing down the aggregates. ...and this related comment in eager_aggregate.out: >-- Ensure aggregation cannot be pushed down to the nullable side While I'm new to this work and its subtleties, I'm wondering if this is too broad a condition. I modified the first test query in eager_aggregate.sql to make it a LEFT JOIN and eager aggregation indeed did not happen, which is expected based on the comments upthread. query: SET enable_eager_aggregate=3DON; EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.a, sum(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b =3D t2.b GROUP BY t1.a ORDER BY t1.a; plan: QUERY PLAN ------------------------------------------------------------ GroupAggregate Output: t1.a, sum(t2.c) Group Key: t1.a -> Sort Output: t1.a, t2.c Sort Key: t1.a -> Hash Right Join Output: t1.a, t2.c Hash Cond: (t2.b =3D t1.b) -> Seq Scan on public.eager_agg_t2 t2 Output: t2.a, t2.b, t2.c -> Hash Output: t1.a, t1.b -> Seq Scan on public.eager_agg_t1 t1 Output: t1.a, t1.b (15 rows) (NOTE: I changed the aggregate from avg(...) to sum(...) for simplicity) But, it seems that eager aggregation for the query above can be "replicated" as: query: EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.a, sum(t2.c) FROM eager_agg_t1 t1 LEFT JOIN ( SELECT b, sum(c) c FROM eager_agg_t2 t2p GROUP BY b ) t2 ON t1.b =3D t2.b GROUP BY t1.a ORDER BY t1.a; The output of both the original query and this one match (and the plans with eager aggregation and the subquery are nearly identical if you restore the LEFT JOIN to a JOIN). I admittedly may be missing a subtlety, but does this mean that there are conditions under which eager aggregation can be pushed down to the nullable side? -Paul- On Sat, Jul 6, 2024 at 4:56=E2=80=AFPM Richard Guo = wrote: > On Thu, Jun 13, 2024 at 4:07=E2=80=AFPM Richard Guo > wrote: > > I spent some time testing this patchset and found a few more issues. > > ... > > > Hence here is the v8 patchset, with fixes for all the above issues. > > I found an 'ORDER/GROUP BY expression not found in targetlist' error > with this patchset, with the query below: > > create table t (a boolean); > > set enable_eager_aggregate to on; > > explain (costs off) > select min(1) from t t1 left join t t2 on t1.a group by (not (not > t1.a)), t1.a order by t1.a; > ERROR: ORDER/GROUP BY expression not found in targetlist > > This happens because the two grouping items are actually the same and > standard_qp_callback would remove one of them. The fully-processed > groupClause is kept in root->processed_groupClause. However, when > collecting grouping expressions in create_grouping_expr_infos, we are > checking parse->groupClause, which is incorrect. > > The fix is straightforward: check root->processed_groupClause instead. > > Here is a new rebase with this fix. > > Thanks > Richard > --000000000000fc99f6061c9f4a95 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
R= ichard:

Thanks for reviving this patch and for all of your work on i= t! Eager aggregation pushdown will be beneficial for my work and I'm ho= ping to see it land.


I was playing around with v9 of the patches= and was specifically curious about this previous statement...

>This patch also makes eager a= ggregation work with outer joins.=C2=A0 With
>outer join, the aggrega= te cannot be pushed down if any column referenced
>by grouping expres= sions or aggregate functions is nullable by an outer
>join above the = relation to which we want to apply the partiall
>aggregation.=C2=A0 T= hanks to Tom's outer-join-aware-Var infrastructure, we
>can easil= y identify such situations and subsequently refrain from
>pushing dow= n the aggregates.

=C2=A0...and this related comment in eager_aggregate.out:

>-- En= sure aggregation cannot be pushed down to the nullable side

While I&= #39;m new to this work and its subtleties, I'm wondering if this is too= broad a condition.

I modified the first test query in eager_aggrega= te.sql to make it a LEFT JOIN and eager aggregation indeed did not happen, = which is expected based on the comments upthread.

query:
SET enab= le_eager_aggregate=3DON;
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, su= m(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b =3D t2.b GRO= UP BY t1.a ORDER BY t1.a;

plan:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0QUERY PLAN =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0
------------------------------------------------------------
= =C2=A0GroupAggregate
=C2=A0 =C2=A0Output: t1.a, sum(t2.c)
=C2=A0 =C2= =A0Group Key: t1.a
=C2=A0 =C2=A0-> =C2=A0Sort
=C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0Output: t1.a, t2.c
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Sort = Key: t1.a
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0Hash Right Join<= br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Output: t1.a, t2.= c
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Hash Cond: (t2.= b =3D t1.b)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0->= =C2=A0Seq Scan on public.eager_agg_t2 t2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Output: t2.a, t2.b, t2.c=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0Hash=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0Output: t1.a, t1.b
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0-> =C2=A0Seq Scan on public.eager_agg_t1 t1=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0Output: t1.a, t1.b
(15 rows)

(NOTE: I cha= nged the aggregate from avg(...) to sum(...) for simplicity)

But, it= seems that eager aggregation for the query above can be "replicated&q= uot; as:

query:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, = sum(t2.c)
FROM eager_agg_t1 t1
LEFT JOIN (
=C2=A0 =C2=A0 SELECT b,= sum(c) c
=C2=A0 =C2=A0 FROM eager_agg_t2 t2p
=C2=A0 =C2=A0 GROUP BY = b
) t2 ON t1.b =3D t2.b
GROUP BY t1.a
ORDER BY t1.a;

The ou= tput of both the original query and this one match (and the plans with eage= r aggregation and the subquery are nearly identical if you restore the LEFT= JOIN to a JOIN). I admittedly may be missing a subtlety, but does this mea= n that there are conditions under which eager aggregation can be pushed dow= n to the nullable side?


-Paul-

On Sat, Jul 6, 2024 at 4:56=E2=80=AFPM Richard Guo &= lt;guofenglinux@gmail.com>= wrote:
On Thu, = Jun 13, 2024 at 4:07=E2=80=AFPM Richard Guo <guofenglinux@gmail.com> wrote:
> I spent some time testing this patchset and found a few more issues. > ...

> Hence here is the v8 patchset, with fixes for all the above issues.
I found an 'ORDER/GROUP BY expression not found in targetlist' erro= r
with this patchset, with the query below:

create table t (a boolean);

set enable_eager_aggregate to on;

explain (costs off)
select min(1) from t t1 left join t t2 on t1.a group by (not (not
t1.a)), t1.a order by t1.a;
ERROR:=C2=A0 ORDER/GROUP BY expression not found in targetlist

This happens because the two grouping items are actually the same and
standard_qp_callback would remove one of them.=C2=A0 The fully-processed groupClause is kept in root->processed_groupClause.=C2=A0 However, when<= br> collecting grouping expressions in create_grouping_expr_infos, we are
checking parse->groupClause, which is incorrect.

The fix is straightforward: check root->processed_groupClause instead.
Here is a new rebase with this fix.

Thanks
Richard
--000000000000fc99f6061c9f4a95--