public inbox for [email protected]
help / color / mirror / Atom feedFrom: Matheus Alcantara <[email protected]>
To: Richard Guo <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: Tender Wang <[email protected]>
Cc: Paul George <[email protected]>
Cc: Andy Fan <[email protected]>
Cc: PostgreSQL-development <[email protected]>
Cc: [email protected]
Subject: Re: Eager aggregation, take 3
Date: Thu, 14 Aug 2025 16:22:49 -0300
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAMbWs4_VtGu18P-jWMXAp3Q+mzGDCaT8AhxQDyWv2_rUxsjv8A@mail.gmail.com>
References: <CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com>
<[email protected]>
<CAMbWs49=eAd2W9jCtGhaZPPp+SOC_2rg16RTG74xAht=hkr5JQ@mail.gmail.com>
<CAMbWs49Nc4M3H+eCf1+8w8piDyEECjRb-gK_JMF4VvcyWwGEVQ@mail.gmail.com>
<CAMbWs49E_dR0nobsExsyetpnBpHObLTsQLsEbWKQLkh0omPxNg@mail.gmail.com>
<CAMbWs49B_qUiHvu2EqLHZRpLr3p_+QPBs50n2=L5ibYzniwTzA@mail.gmail.com>
<CAMbWs48KCQtDymnYi4M=Vz+WMzo3fkBxffJsyk6VX6hOXXv+VA@mail.gmail.com>
<CAMbWs49sv_MuOYqqrtmBN_oYf8VSQ2BXDwXaTpJTn_YfwyYdWQ@mail.gmail.com>
<CAMbWs49U8Sddx_fGszPdvA3jp_nheynxaqm5Y4NqMV21VBYAuQ@mail.gmail.com>
<CAMbWs4-LwyOg9ga+NVF7yQbMi0ZsZdN1G_sO2v=YJHV18=19+A@mail.gmail.com>
<CALA8mJquG_zCJXfVwash5LKqHGtZXQmq7RfTSaRDUzGYeW=7Rw@mail.gmail.com>
<CAMbWs4_EjgcBib5+y1LYcGB3EK3Y6R+OOxGKfJo42fDovadk1g@mail.gmail.com>
<CALA8mJqe0anNM8_V6cOeOQnCHUTQggn7iOQNyQr1VaN_xMjz+w@mail.gmail.com>
<CAMbWs48eE-s-jCicC8pSVfXk8Ws-ZvUKnsw8qH-DkVBdYv0eJQ@mail.gmail.com>
<CAMbWs483a7-8M0pDttG44r-+8Gevn9VG0xNceE3WpkEQxJXPZw@mail.gmail.com>
<CAHewXNmYM6DvR_kaxDL0w0fz9BwKbac+TSU3QS10aA3cXHyMmA@mail.gmail.com>
<CA+TgmoaxH=P63hLYgyJJcEbMRnw3xi16d=HxFi1j-m7MhH6W_w@mail.gmail.com>
<CAMbWs4_cOnpGsywj9Jt1WAgzJLW9Rxt5X13cfGz4iN2qvZQ68g@mail.gmail.com>
<CA+Tgmob0q7bRbsFTVDMjxHE6zA4uDQLQa-s0CtwUw49V53UL_A@mail.gmail.com>
<CAMbWs4-Xru_eKBeRHFduigSGihdixFWVTR8A+dtMw7Mao+RkJA@mail.gmail.com>
<CAMbWs49dLjSSQRWeud+KSN0G531ciZdYoLBd5qktXA+3JQm_UQ@mail.gmail.com>
<CAMbWs48LXGC-Y63YtzEeM-3f0NUXWCUEMs7XwGzywXTjUNMcxQ@mail.gmail.com>
<CAMbWs48XdzvnwfTHWxQ7qK-yjvdrbwsPpqhJBuKDnO+hcbsVwA@mail.gmail.com>
<CA+TgmoaO-7RHdyJuizWChXZm7EJGvDcfoePDDEyUA-y8vTB1tg@mail.gmail.com>
<CAMbWs4-+jXRpKuFMZa08bS34-TBka3qqjVMAUjF=-1RA9BKvgg@mail.gmail.com>
<CA+TgmoZapU1y59-s3o8oPt7Hv+cxRh_34FMu6MXumomLe+U1Cw@mail.gmail.com>
<CAMbWs4_sEeeBmucBzbamBMfA9uLxVmOc_MV=ZpSyDbTcrUO_XQ@mail.gmail.com>
<CA+Tgmob4fnv57PQB0Oox86mHSJQ0vVL249eT=gqPvrMkG7h1zw@mail.gmail.com>
<CAMbWs489NYyTcCTbrUi7hPXKtNY5vHrrFcHyMRAv=CA5WsszVw@mail.gmail.com>
<CA+TgmoazmDdcc7NeTo3WM5HW3DASNP4rfZw6X+2nnQKHampOng@mail.gmail.com>
<CAMbWs49bYr-ULhA+-At0iQ+NaFKy72AWB6jzughk8MPTiY+gMQ@mail.gmail.com>
<CA+TgmoYa-zexdbc5nO_D6oxPMZYs06hkYwZK5Dufq+4Hhe6uNQ@mail.gmail.com>
<CAMbWs4_aji0kME490phz6nTXnPToddUn19OF3rLm1g4TbNkuzQ@mail.gmail.com>
<CA+Tgmoa3+G_=8XuQWN+0ugv6r-WV6ruFESpOxpXAAKrne3oVDQ@mail.gmail.com>
<CAMbWs49qiox13EKb7bqgLu7Gu9oar+xe6KMwBjgFwod3JzPfUw@mail.gmail.com>
<CAMbWs48F8WGA-Lzj1Dk76mFqRFxPEwG2_9Zb7+pFs8oi6ew2pw@mail.gmail.com>
<CAMbWs484ms=WRZamOyWnVditREKFqipLsdaQjcv2uKur8SZuqw@mail.gmail.com>
<CAMbWs49bL2ZMSc0W4G8=R7bjaa-vO6grucEOFYLZFUZE7+nzrQ@mail.gmail.com>
<[email protected]>
<CAMbWs4_VtGu18P-jWMXAp3Q+mzGDCaT8AhxQDyWv2_rUxsjv8A@mail.gmail.com>
On 08/08/25 22:32, Richard Guo wrote:
>> It sounds like a good way to go for me, looking forward to the next
>> patch version to perform some other tests.
>
> OK. Here it is.
>
Thanks! I can confirm now that I can see the eager aggregate in action
in some of these queries that I've tested on the TPC-DS benchmark.
I few questions regarding the new version:
I've noticed that when a query has a WHERE clause filtering columns from
the same relation being aggregated using "=" operator the Partial and
Finalize aggregation nodes are not present on explain results even if
setup_eager_aggregation() returns true on all if statements and also
RelAggInfo->agg_useful is true. For example, consider this query that is
used on eager aggregation paper that use some tables from TPC-H
benchmark:
tpch=# show enable_eager_aggregate ;
enable_eager_aggregate
------------------------
on
(1 row)
tpch=# set max_parallel_workers_per_gather to 0;
SET
tpch=# EXPLAIN(COSTS OFF) SELECT O_CLERK,
SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS LOSS
FROM LINEITEM
JOIN ORDERS ON L_ORDERKEY = O_ORDERKEY
WHERE L_RETURNFLAG = 'R'
GROUP BY O_CLERK;
QUERY PLAN
--------------------------------------------------------------
HashAggregate
Group Key: orders.o_clerk
-> Hash Join
Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
-> Seq Scan on lineitem
Filter: (l_returnflag = 'R'::bpchar)
-> Hash
-> Seq Scan on orders
(8 rows)
Debugging this query shows that all if conditions on
setup_eager_aggregation() returns false and create_agg_clause_infos()
and create_grouping_expr_infos() are called. The RelAggInfo->agg_useful
is also being set to true so I would expect to see Finalize and Partial
agg nodes, is this correct or am I missing something here?
Removing the WHERE clause I can see the Finalize and Partial agg nodes:
tpch=# EXPLAIN(COSTS OFF) SELECT O_CLERK,
SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS LOSS
FROM LINEITEM
JOIN ORDERS ON L_ORDERKEY = O_ORDERKEY
GROUP BY O_CLERK;
QUERY PLAN
----------------------------------------------------------------------
Finalize HashAggregate
Group Key: orders.o_clerk
-> Merge Join
Merge Cond: (lineitem.l_orderkey = orders.o_orderkey)
-> Partial GroupAggregate
Group Key: lineitem.l_orderkey
-> Index Scan using idx_lineitem_orderkey on lineitem
-> Index Scan using orders_pkey on orders
(8 rows)
This can also be reproduced with an addition of a WHERE clause on some
tests on eager_aggregate.sql:
postgres=# EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, avg(t2.c)
FROM eager_agg_t1 t1
JOIN eager_agg_t2 t2
ON t1.b = t2.b
WHERE t2.c = 5
GROUP BY t1.a
ORDER BY t1.a;
QUERY PLAN
------------------------------------------------------------------
GroupAggregate
Output: t1.a, avg(t2.c)
Group Key: t1.a
-> Sort
Output: t1.a, t2.c
Sort Key: t1.a
-> Hash Join
Output: t1.a, t2.c
Hash Cond: (t1.b = t2.b)
-> Seq Scan on public.eager_agg_t1 t1
Output: t1.a, t1.b, t1.c
-> Hash
Output: t2.c, t2.b
-> Seq Scan on public.eager_agg_t2 t2
Output: t2.c, t2.b
Filter: (t2.c = '5'::double precision)
(16 rows)
Note that if I use ">" operator for example, this doesn't happen:
SELECT t1.a, avg(t2.c)
FROM eager_agg_t1 t1
JOIN eager_agg_t2 t2
ON t1.b = t2.b
WHERE t2.c > 5
GROUP BY t1.a
ORDER BY t1.a;
QUERY PLAN
------------------------------------------------------------------------
Finalize GroupAggregate
Output: t1.a, avg(t2.c)
Group Key: t1.a
-> Sort
Output: t1.a, (PARTIAL avg(t2.c))
Sort Key: t1.a
-> Hash Join
Output: t1.a, (PARTIAL avg(t2.c))
Hash Cond: (t1.b = t2.b)
-> Seq Scan on public.eager_agg_t1 t1
Output: t1.a, t1.b, t1.c
-> Hash
Output: t2.b, (PARTIAL avg(t2.c))
-> Partial HashAggregate
Output: t2.b, PARTIAL avg(t2.c)
Group Key: t2.b
-> Seq Scan on public.eager_agg_t2 t2
Output: t2.a, t2.b, t2.c
Filter: (t2.c > '5'::double precision)
(19 rows)
Is this behavior correct? If it's correct, would be possible to check
this limitation on setup_eager_aggregation() and maybe skip all the
other work?
--
Matheus Alcantara
view thread (75+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Eager aggregation, take 3
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox