public inbox for [email protected]  
help / color / mirror / Atom feed
Eager aggregation, take 3
6+ messages / 3 participants
[nested] [flat]

* Eager aggregation, take 3
@ 2024-03-04 08:27  Richard Guo <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Richard Guo @ 2024-03-04 08:27 UTC (permalink / raw)
  To: pgsql-hackers

Hi All,

Eager aggregation is a query optimization technique that partially
pushes a group-by past a join, and finalizes it once all the relations
are joined.  Eager aggregation reduces the number of input rows to the
join and thus may result in a better overall plan.  This technique is
thoroughly described in the 'Eager Aggregation and Lazy Aggregation'
paper [1].

Back in 2017, a patch set has been proposed by Antonin Houska to
implement eager aggregation in thread [2].  However, it was at last
withdrawn after entering the pattern of "please rebase thx" followed by
rebasing and getting no feedback until "please rebase again thx".  A
second attempt in 2022 unfortunately fell into the same pattern about
one year ago and was eventually closed again [3].

That patch set has included most of the necessary concepts to implement
eager aggregation.  However, as far as I can see, it has several weak
points that we need to address.  It introduces invasive changes to some
core planner functions, such as make_join_rel().  And with such changes
join_is_legal() would be performed three times for the same proposed
join, which is not great.  Another weak point is that the complexity of
join searching dramatically increases with the growing number of
relations to be joined.  This occurs because when we generate partially
aggregated paths, each path of the input relation is considered as an
input path for the grouped paths.  As a result, the number of grouped
paths we generate increases exponentially, leading to a significant
explosion in computational complexity.  Other weak points include the
lack of support for outer joins and partitionwise joins.  And during my
review of the code, I came across several bugs (planning error or crash)
that need to be addressed.

I'd like to give it another take to implement eager aggregation, while
borrowing lots of concepts and many chunks of codes from the previous
patch set.  Please see attached.  I have chosen to use the term 'Eager
Aggregation' from the paper [1] instead of 'Aggregation push-down', to
differentiate the aggregation push-down technique in FDW.

The patch has been split into small pieces to make the review easier.

0001 introduces the RelInfoList structure, which encapsulates both a
list and a hash table, so that we can leverage the hash table for faster
lookups not only for join relations but also for upper relations.  With
eager aggregation, it is possible that we generate so many upper rels of
type UPPERREL_PARTIAL_GROUP_AGG that a hash table can help a lot with
lookups.

0002 introduces the RelAggInfo structure to store information needed to
create grouped paths for base and join rels.  It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.

0003 checks if eager aggregation is applicable, and if so, collects
suitable aggregate expressions and grouping expressions in the query,
and records them in root->agg_clause_list and root->group_expr_list
respectively.

0004 implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier.  In this patch, when we check
if a target expression can act as grouping expression, we need to check
if this expression can be known equal to other expressions due to ECs
that can act as grouping expressions.  This patch leverages function
exprs_known_equal() to achieve that, after enhancing this function to
consider opfamily if provided.

0005 implements the functions that generate paths for grouped relations
by adding sorted and hashed partial aggregation paths on top of paths of
the plain base or join relations.  For sorted partial aggregation paths,
we only consider any suitably-sorted input paths as well as sorting the
cheapest-total path.  For hashed partial aggregation paths, we only
consider the cheapest-total path as input.  By not considering other
paths we can reduce the number of grouping paths as much as possible
while still achieving reasonable results.

0006 builds grouped relations for each base relation if possible, and
generates aggregation paths for the grouped base relations.

0007 builds grouped relations for each just-processed join relation if
possible, and generates aggregation paths for the grouped join
relations.  The changes made to make_join_rel() are relatively minor,
with the addition of a new function make_grouped_join_rel(), which finds
or creates a grouped relation for the just-processed joinrel, and
generates grouped paths by joining a grouped input relation with a
non-grouped input relation.

The other way to generate grouped paths is by adding sorted and hashed
partial aggregation paths on top of paths of the joinrel.  This occurs
in standard_join_search(), after we've run set_cheapest() for the
joinrel.  The reason for performing this step after set_cheapest() is
that we need to know the joinrel's cheapest paths (see 0005).

This patch also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that.  Additionally, this
patch extends the functionality of eager aggregation to work with
partitionwise join and geqo.

This patch also makes eager aggregation work with outer joins.  With
outer join, the aggregate cannot be pushed down if any column referenced
by grouping expressions or aggregate functions is nullable by an outer
join above the relation to which we want to apply the partiall
aggregation.  Thanks to Tom's outer-join-aware-Var infrastructure, we
can easily identify such situations and subsequently refrain from
pushing down the aggregates.

Starting from this patch, you should be able to see plans with eager
aggregation.

0008 adds test cases for eager aggregation.

0009 adds a section in README that describes this feature (copied from
previous patch set, with minor tweaks).

Thoughts and comments are welcome.

[1] https://www.vldb.org/conf/1995/P345.PDF
[2] https://www.postgresql.org/message-id/flat/9666.1491295317%40localhost
[3]
https://www.postgresql.org/message-id/flat/OS3PR01MB66609589B896FBDE190209F495EE9%40OS3PR01MB6660.jp...

Thanks
Richard


Attachments:

  [application/octet-stream] v1-0001-Introduce-RelInfoList-structure.patch (14.3K, 3-v1-0001-Introduce-RelInfoList-structure.patch)
  download | inline diff:
From 542f02eb98b84dad9990c03bef792bb3e816fd23 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Mon, 19 Feb 2024 15:16:51 +0800
Subject: [PATCH v1 1/9] Introduce RelInfoList structure

This commit introduces the RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table for
faster lookups not only for join relations but also for upper relations.
---
 contrib/postgres_fdw/postgres_fdw.c    |   3 +-
 src/backend/optimizer/geqo/geqo_eval.c |  20 +--
 src/backend/optimizer/path/allpaths.c  |   7 +-
 src/backend/optimizer/plan/planmain.c  |   5 +-
 src/backend/optimizer/util/relnode.c   | 164 ++++++++++++++-----------
 src/include/nodes/pathnodes.h          |  31 +++--
 6 files changed, 133 insertions(+), 97 deletions(-)

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 142dcfc995..f46fc604b4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6069,7 +6069,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
 	 */
 	Assert(fpinfo->relation_index == 0);	/* shouldn't be set yet */
 	fpinfo->relation_index =
-		list_length(root->parse->rtable) + list_length(root->join_rel_list);
+		list_length(root->parse->rtable) +
+		list_length(root->join_rel_list->items);
 
 	return true;
 }
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..1141156899 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
 	 * truncating the list to its original length.  NOTE this assumes that any
 	 * added entries are appended at the end!
 	 *
-	 * We also must take care not to mess up the outer join_rel_hash, if there
-	 * is one.  We can do this by just temporarily setting the link to NULL.
-	 * (If we are dealing with enough join rels, which we very likely are, a
-	 * new hash table will get built and used locally.)
+	 * We also must take care not to mess up the outer join_rel_list->hash, if
+	 * there is one.  We can do this by just temporarily setting the link to
+	 * NULL.  (If we are dealing with enough join rels, which we very likely
+	 * are, a new hash table will get built and used locally.)
 	 *
 	 * join_rel_level[] shouldn't be in use, so just Assert it isn't.
 	 */
-	savelength = list_length(root->join_rel_list);
-	savehash = root->join_rel_hash;
+	savelength = list_length(root->join_rel_list->items);
+	savehash = root->join_rel_list->hash;
 	Assert(root->join_rel_level == NULL);
 
-	root->join_rel_hash = NULL;
+	root->join_rel_list->hash = NULL;
 
 	/* construct the best path for the given combination of relations */
 	joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
 	 * Restore join_rel_list to its former state, and put back original
 	 * hashtable if any.
 	 */
-	root->join_rel_list = list_truncate(root->join_rel_list,
-										savelength);
-	root->join_rel_hash = savehash;
+	root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+											   savelength);
+	root->join_rel_list->hash = savehash;
 
 	/* release all the memory acquired within gimme_tree */
 	MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d404fbf262..351bf2e9e4 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3413,9 +3413,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
  * needed for these paths need have been instantiated.
  *
  * Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash.  If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures.  See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash.  If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures.  See geqo_eval() for
+ * an example.
  */
 RelOptInfo *
 standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index ca47c7d310..3341e64d2b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -65,8 +65,9 @@ query_planner(PlannerInfo *root,
 	 * NOTE: append_rel_list was set up by subquery_planner, so do not touch
 	 * here.
 	 */
-	root->join_rel_list = NIL;
-	root->join_rel_hash = NULL;
+	root->join_rel_list = makeNode(RelInfoList);
+	root->join_rel_list->items = NIL;
+	root->join_rel_list->hash = NULL;
 	root->join_rel_level = NULL;
 	root->join_cur_level = 0;
 	root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e5f4062bfb..9e25750acd 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -35,11 +35,15 @@
 #include "utils/lsyscache.h"
 
 
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
 {
-	Relids		join_relids;	/* hash key --- MUST BE FIRST */
-	RelOptInfo *join_rel;
-} JoinHashEntry;
+	Relids		relids;			/* hash key --- MUST BE FIRST */
+	RelOptInfo *rel;
+} RelInfoEntry;
 
 static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
 								RelOptInfo *input_rel,
@@ -472,11 +476,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
 }
 
 /*
- * build_join_rel_hash
- *	  Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ *	  Construct the auxiliary hash table for relations.
  */
 static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
 {
 	HTAB	   *hashtab;
 	HASHCTL		hash_ctl;
@@ -484,47 +488,49 @@ build_join_rel_hash(PlannerInfo *root)
 
 	/* Create the hash table */
 	hash_ctl.keysize = sizeof(Relids);
-	hash_ctl.entrysize = sizeof(JoinHashEntry);
+	hash_ctl.entrysize = sizeof(RelInfoEntry);
 	hash_ctl.hash = bitmap_hash;
 	hash_ctl.match = bitmap_match;
 	hash_ctl.hcxt = CurrentMemoryContext;
-	hashtab = hash_create("JoinRelHashTable",
+	hashtab = hash_create("RelHashTable",
 						  256L,
 						  &hash_ctl,
 						  HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
 
-	/* Insert all the already-existing joinrels */
-	foreach(l, root->join_rel_list)
+	/* Insert all the already-existing relations */
+	foreach(l, list->items)
 	{
 		RelOptInfo *rel = (RelOptInfo *) lfirst(l);
-		JoinHashEntry *hentry;
+		RelInfoEntry *hentry;
 		bool		found;
 
-		hentry = (JoinHashEntry *) hash_search(hashtab,
-											   &(rel->relids),
-											   HASH_ENTER,
-											   &found);
+		hentry = (RelInfoEntry *) hash_search(hashtab,
+											  &(rel->relids),
+											  HASH_ENTER,
+											  &found);
 		Assert(!found);
-		hentry->join_rel = rel;
+		hentry->rel = rel;
 	}
 
-	root->join_rel_hash = hashtab;
+	list->hash = hashtab;
 }
 
 /*
- * find_join_rel
- *	  Returns relation entry corresponding to 'relids' (a set of RT indexes),
- *	  or NULL if none exists.  This is for join relations.
+ * find_rel_info
+ *	  Find an RelOptInfo entry.
  */
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
 {
+	if (list == NULL)
+		return NULL;
+
 	/*
 	 * Switch to using hash lookup when list grows "too long".  The threshold
 	 * is arbitrary and is known only here.
 	 */
-	if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
-		build_join_rel_hash(root);
+	if (!list->hash && list_length(list->items) > 32)
+		build_rel_hash(list);
 
 	/*
 	 * Use either hashtable lookup or linear search, as appropriate.
@@ -534,23 +540,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
 	 * so would force relids out of a register and thus probably slow down the
 	 * list-search case.
 	 */
-	if (root->join_rel_hash)
+	if (list->hash)
 	{
 		Relids		hashkey = relids;
-		JoinHashEntry *hentry;
+		RelInfoEntry *hentry;
 
-		hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
-											   &hashkey,
-											   HASH_FIND,
-											   NULL);
+		hentry = (RelInfoEntry *) hash_search(list->hash,
+											  &hashkey,
+											  HASH_FIND,
+											  NULL);
 		if (hentry)
-			return hentry->join_rel;
+			return hentry->rel;
 	}
 	else
 	{
 		ListCell   *l;
 
-		foreach(l, root->join_rel_list)
+		foreach(l, list->items)
 		{
 			RelOptInfo *rel = (RelOptInfo *) lfirst(l);
 
@@ -562,6 +568,54 @@ find_join_rel(PlannerInfo *root, Relids relids)
 	return NULL;
 }
 
+/*
+ * find_join_rel
+ *	  Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ *	  or NULL if none exists.  This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+	return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ *		Add given relation to the given list. Also add it to the auxiliary
+ *		hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+	/* GEQO requires us to append the new relation to the end of the list! */
+	list->items = lappend(list->items, rel);
+
+	/* store it into the auxiliary hashtable if there is one. */
+	if (list->hash)
+	{
+		RelInfoEntry *hentry;
+		bool		found;
+
+		hentry = (RelInfoEntry *) hash_search(list->hash,
+											  &(rel->relids),
+											  HASH_ENTER,
+											  &found);
+		Assert(!found);
+		hentry->rel = rel;
+	}
+}
+
+/*
+ * add_join_rel
+ *		Add given join relation to the list of join relations in the given
+ *		PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+	add_rel_info(root->join_rel_list, joinrel);
+}
+
 /*
  * set_foreign_rel_properties
  *		Set up foreign-join fields if outer and inner relation are foreign
@@ -611,32 +665,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
 	}
 }
 
-/*
- * add_join_rel
- *		Add given join relation to the list of join relations in the given
- *		PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
-	/* GEQO requires us to append the new joinrel to the end of the list! */
-	root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
-	/* store it into the auxiliary hashtable if there is one. */
-	if (root->join_rel_hash)
-	{
-		JoinHashEntry *hentry;
-		bool		found;
-
-		hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
-											   &(joinrel->relids),
-											   HASH_ENTER,
-											   &found);
-		Assert(!found);
-		hentry->join_rel = joinrel;
-	}
-}
-
 /*
  * build_join_rel
  *	  Returns relation entry corresponding to the union of two given rels,
@@ -1462,22 +1490,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
 RelOptInfo *
 fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
 {
+	RelInfoList *list = &root->upper_rels[kind];
 	RelOptInfo *upperrel;
-	ListCell   *lc;
-
-	/*
-	 * For the moment, our indexing data structure is just a List for each
-	 * relation kind.  If we ever get so many of one kind that this stops
-	 * working well, we can improve it.  No code outside this function should
-	 * assume anything about how to find a particular upperrel.
-	 */
 
 	/* If we already made this upperrel for the query, return it */
-	foreach(lc, root->upper_rels[kind])
+	if (list)
 	{
-		upperrel = (RelOptInfo *) lfirst(lc);
-
-		if (bms_equal(upperrel->relids, relids))
+		upperrel = find_rel_info(list, relids);
+		if (upperrel)
 			return upperrel;
 	}
 
@@ -1496,7 +1516,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
 	upperrel->cheapest_unique_path = NULL;
 	upperrel->cheapest_parameterized_paths = NIL;
 
-	root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+	add_rel_info(&root->upper_rels[kind], upperrel);
 
 	return upperrel;
 }
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 534692bee1..be51e2c652 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
 	/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
 } UpperRelationKind;
 
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups.  The hash table is
+ * present and valid when 'hash' is not NULL.  Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+	pg_node_attr(no_copy_equal, no_read)
+
+	NodeTag		type;
+
+	List	   *items;
+	struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
 /*----------
  * PlannerGlobal
  *		Global information for planning/optimization
@@ -267,15 +286,9 @@ struct PlannerInfo
 
 	/*
 	 * join_rel_list is a list of all join-relation RelOptInfos we have
-	 * considered in this planning run.  For small problems we just scan the
-	 * list to do lookups, but when there are many join relations we build a
-	 * hash table for faster lookups.  The hash table is present and valid
-	 * when join_rel_hash is not NULL.  Note that we still maintain the list
-	 * even when using the hash table for lookups; this simplifies life for
-	 * GEQO.
+	 * considered in this planning run.
 	 */
-	List	   *join_rel_list;
-	struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+	RelInfoList *join_rel_list;	/* list of join-relation RelOptInfos */
 
 	/*
 	 * When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -408,7 +421,7 @@ struct PlannerInfo
 	 * Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
 	 * upper rel.
 	 */
-	List	   *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+	RelInfoList		upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
 
 	/* Result tlists chosen by grouping_planner for upper-stage processing */
 	struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
-- 
2.31.0



  [application/octet-stream] v1-0005-Implement-functions-that-generate-paths-for-grouped-relations.patch (13.1K, 4-v1-0005-Implement-functions-that-generate-paths-for-grouped-relations.patch)
  download | inline diff:
From 6b3b7a944bbb018e77dd8e4b787b9c660a9ed69b Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 14:19:39 +0800
Subject: [PATCH v1 5/9] Implement functions that generate paths for grouped
 relations

This commit implements the functions that generate paths for grouped
relations by adding sorted and hashed partial aggregation paths on top
of paths of the plain base or join relations.
---
 src/backend/optimizer/path/allpaths.c | 307 ++++++++++++++++++++++++++
 src/backend/optimizer/util/pathnode.c |  12 +-
 src/include/optimizer/paths.h         |   4 +
 3 files changed, 315 insertions(+), 8 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 9384c54ed9..f47ad04846 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -41,6 +41,7 @@
 #include "optimizer/paths.h"
 #include "optimizer/plancat.h"
 #include "optimizer/planner.h"
+#include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
 #include "parser/parse_clause.h"
@@ -50,6 +51,7 @@
 #include "port/pg_bitutils.h"
 #include "rewrite/rewriteManip.h"
 #include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
 
 
 /* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -3306,6 +3308,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
 	}
 }
 
+/*
+ * generate_grouped_paths
+ *		Generate paths for a grouped relation by adding sorted and hashed
+ *		partial aggregation paths on top of paths of the plain base or join
+ *		relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+					   RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+	AggClauseCosts agg_costs;
+	bool		can_hash;
+	bool		can_sort;
+	Path	   *cheapest_total_path = NULL;
+	Path	   *cheapest_partial_path = NULL;
+	double		dNumGroups = 0;
+	double		dNumPartialGroups = 0;
+
+	if (IS_DUMMY_REL(rel_plain))
+	{
+		mark_dummy_rel(rel_grouped);
+		return;
+	}
+
+	MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+	get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+	/*
+	 * Determine whether it's possible to perform sort-based implementations of
+	 * grouping.
+	 */
+	can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+	/*
+	 * Determine whether we should consider hash-based implementations of
+	 * grouping.
+	 */
+	Assert(root->numOrderedAggs == 0);
+	can_hash = (agg_info->group_clauses != NIL &&
+				grouping_is_hashable(agg_info->group_clauses));
+
+	/*
+	 * Consider whether we should generate partially aggregated non-partial
+	 * paths.  We can only do this if we have a non-partial path.
+	 */
+	if (rel_plain->pathlist != NIL)
+	{
+		cheapest_total_path = rel_plain->cheapest_total_path;
+		Assert(cheapest_total_path != NULL);
+	}
+
+	/*
+	 * If parallelism is possible for rel_grouped, then we should consider
+	 * generating partially-grouped partial paths.  However, if the plain rel
+	 * has no partial paths, then we can't.
+	 */
+	if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+	{
+		cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+		Assert(cheapest_partial_path != NULL);
+	}
+
+	/* Estimate number of partial groups. */
+	if (cheapest_total_path != NULL)
+		dNumGroups = estimate_num_groups(root,
+										 agg_info->group_exprs,
+										 cheapest_total_path->rows,
+										 NULL, NULL);
+	if (cheapest_partial_path != NULL)
+		dNumPartialGroups = estimate_num_groups(root,
+												agg_info->group_exprs,
+												cheapest_partial_path->rows,
+												NULL, NULL);
+
+	if (can_sort && cheapest_total_path != NULL)
+	{
+		ListCell   *lc;
+
+		/*
+		 * Use any available suitably-sorted path as input, and also consider
+		 * sorting the cheapest-total path.
+		 */
+		foreach(lc, rel_plain->pathlist)
+		{
+			Path	   *input_path = (Path *) lfirst(lc);
+			Path	   *path;
+			bool		is_sorted;
+			int			presorted_keys;
+
+			/*
+			 * Since the path originates from the non-grouped relation which is
+			 * not aware of eager aggregation, we must ensure that it provides
+			 * the correct input for the partial aggregation.
+			 */
+			path = (Path *) create_projection_path(root,
+												   rel_grouped,
+												   input_path,
+												   agg_info->agg_input);
+
+			is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+													path->pathkeys,
+													&presorted_keys);
+			if (!is_sorted)
+			{
+				/*
+				 * Try at least sorting the cheapest path and also try
+				 * incrementally sorting any path which is partially sorted
+				 * already (no need to deal with paths which have presorted
+				 * keys when incremental sort is disabled unless it's the
+				 * cheapest input path).
+				 */
+				if (input_path != cheapest_total_path &&
+					(presorted_keys == 0 || !enable_incremental_sort))
+					continue;
+
+				/*
+				 * We've no need to consider both a sort and incremental sort.
+				 * We'll just do a sort if there are no presorted keys and an
+				 * incremental sort when there are presorted keys.
+				 */
+				if (presorted_keys == 0 || !enable_incremental_sort)
+					path = (Path *) create_sort_path(root,
+													 rel_grouped,
+													 path,
+													 agg_info->group_pathkeys,
+													 -1.0);
+				else
+					path = (Path *) create_incremental_sort_path(root,
+																 rel_grouped,
+																 path,
+																 agg_info->group_pathkeys,
+																 presorted_keys,
+																 -1.0);
+			}
+
+			/*
+			 * qual is NIL because the HAVING clause cannot be evaluated until the
+			 * final value of the aggregate is known.
+			 */
+			path = (Path *) create_agg_path(root,
+											rel_grouped,
+											path,
+											agg_info->target,
+											AGG_SORTED,
+											AGGSPLIT_INITIAL_SERIAL,
+											agg_info->group_clauses,
+											NIL,
+											&agg_costs,
+											dNumGroups);
+
+			add_path(rel_grouped, path);
+		}
+	}
+
+	if (can_sort && cheapest_partial_path != NULL)
+	{
+		ListCell   *lc;
+
+		/* Similar to above logic, but for partial paths. */
+		foreach(lc, rel_plain->partial_pathlist)
+		{
+			Path	   *input_path = (Path *) lfirst(lc);
+			Path	   *path;
+			bool		is_sorted;
+			int			presorted_keys;
+
+			/*
+			 * Since the path originates from the non-grouped relation which is
+			 * not aware of eager aggregation, we must ensure that it provides
+			 * the correct input for the partial aggregation.
+			 */
+			path = (Path *) create_projection_path(root,
+												   rel_grouped,
+												   input_path,
+												   agg_info->agg_input);
+
+			is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+													path->pathkeys,
+													&presorted_keys);
+
+			if (!is_sorted)
+			{
+				/*
+				 * Try at least sorting the cheapest path and also try
+				 * incrementally sorting any path which is partially sorted
+				 * already (no need to deal with paths which have presorted
+				 * keys when incremental sort is disabled unless it's the
+				 * cheapest input path).
+				 */
+				if (input_path != cheapest_partial_path &&
+					(presorted_keys == 0 || !enable_incremental_sort))
+					continue;
+
+				/*
+				 * We've no need to consider both a sort and incremental sort.
+				 * We'll just do a sort if there are no presorted keys and an
+				 * incremental sort when there are presorted keys.
+				 */
+				if (presorted_keys == 0 || !enable_incremental_sort)
+					path = (Path *) create_sort_path(root,
+													 rel_grouped,
+													 path,
+													 agg_info->group_pathkeys,
+													 -1.0);
+				else
+					path = (Path *) create_incremental_sort_path(root,
+																 rel_grouped,
+																 path,
+																 agg_info->group_pathkeys,
+																 presorted_keys,
+																 -1.0);
+			}
+
+			/*
+			 * qual is NIL because the HAVING clause cannot be evaluated until the
+			 * final value of the aggregate is known.
+			 */
+			path = (Path *) create_agg_path(root,
+											rel_grouped,
+											path,
+											agg_info->target,
+											AGG_SORTED,
+											AGGSPLIT_INITIAL_SERIAL,
+											agg_info->group_clauses,
+											NIL,
+											&agg_costs,
+											dNumPartialGroups);
+
+			add_partial_path(rel_grouped, path);
+		}
+	}
+
+	/*
+	 * Add a partially-grouped HashAgg Path where possible
+	 */
+	if (can_hash && cheapest_total_path != NULL)
+	{
+		Path   *path;
+
+		/*
+		 * Since the path originates from the non-grouped relation which is
+		 * not aware of eager aggregation, we must ensure that it provides
+		 * the correct input for the partial aggregation.
+		 */
+		path = (Path *) create_projection_path(root,
+											   rel_grouped,
+											   cheapest_total_path,
+											   agg_info->agg_input);
+
+		/*
+		 * qual is NIL because the HAVING clause cannot be evaluated until
+		 * the final value of the aggregate is known.
+		 */
+		path = (Path *) create_agg_path(root,
+										rel_grouped,
+										path,
+										agg_info->target,
+										AGG_HASHED,
+										AGGSPLIT_INITIAL_SERIAL,
+										agg_info->group_clauses,
+										NIL,
+										&agg_costs,
+										dNumGroups);
+
+		add_path(rel_grouped, path);
+	}
+
+	/*
+	 * Now add a partially-grouped HashAgg partial Path where possible
+	 */
+	if (can_hash && cheapest_partial_path != NULL)
+	{
+		Path   *path;
+
+		/*
+		 * Since the path originates from the non-grouped relation which is
+		 * not aware of eager aggregation, we must ensure that it provides
+		 * the correct input for the partial aggregation.
+		 */
+		path = (Path *) create_projection_path(root,
+											   rel_grouped,
+											   cheapest_partial_path,
+											   agg_info->agg_input);
+
+		/*
+		 * qual is NIL because the HAVING clause cannot be evaluated until
+		 * the final value of the aggregate is known.
+		 */
+		path = (Path *) create_agg_path(root,
+										rel_grouped,
+										path,
+										agg_info->target,
+										AGG_HASHED,
+										AGGSPLIT_INITIAL_SERIAL,
+										agg_info->group_clauses,
+										NIL,
+										&agg_costs,
+										dNumPartialGroups);
+
+		add_partial_path(rel_grouped, path);
+	}
+}
+
 /*
  * make_rel_from_joinlist
  *	  Build access paths using a "joinlist" to guide the join path search.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 6f79b2e3fe..ee455f7ec2 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2699,8 +2699,7 @@ create_projection_path(PlannerInfo *root,
 	pathnode->path.pathtype = T_Result;
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = target;
-	/* For now, assume we are above any joins, so no parameterization */
-	pathnode->path.param_info = NULL;
+	pathnode->path.param_info = subpath->param_info;
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = rel->consider_parallel &&
 		subpath->parallel_safe &&
@@ -2952,8 +2951,7 @@ create_incremental_sort_path(PlannerInfo *root,
 	pathnode->path.parent = rel;
 	/* Sort doesn't project, so use source path's pathtarget */
 	pathnode->path.pathtarget = subpath->pathtarget;
-	/* For now, assume we are above any joins, so no parameterization */
-	pathnode->path.param_info = NULL;
+	pathnode->path.param_info = subpath->param_info;
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = rel->consider_parallel &&
 		subpath->parallel_safe;
@@ -2999,8 +2997,7 @@ create_sort_path(PlannerInfo *root,
 	pathnode->path.parent = rel;
 	/* Sort doesn't project, so use source path's pathtarget */
 	pathnode->path.pathtarget = subpath->pathtarget;
-	/* For now, assume we are above any joins, so no parameterization */
-	pathnode->path.param_info = NULL;
+	pathnode->path.param_info = subpath->param_info;
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = rel->consider_parallel &&
 		subpath->parallel_safe;
@@ -3158,8 +3155,7 @@ create_agg_path(PlannerInfo *root,
 	pathnode->path.pathtype = T_Agg;
 	pathnode->path.parent = rel;
 	pathnode->path.pathtarget = target;
-	/* For now, assume we are above any joins, so no parameterization */
-	pathnode->path.param_info = NULL;
+	pathnode->path.param_info = subpath->param_info;
 	pathnode->path.parallel_aware = false;
 	pathnode->path.parallel_safe = rel->consider_parallel &&
 		subpath->parallel_safe;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index dcea10888b..68fc05432c 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -58,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
 								  bool override_rows);
 extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
 										 bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+								   RelOptInfo *rel_grouped,
+								   RelOptInfo *rel_plain,
+								   RelAggInfo *agg_info);
 extern int	compute_parallel_worker(RelOptInfo *rel, double heap_pages,
 									double index_pages, int max_workers);
 extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
-- 
2.31.0



  [application/octet-stream] v1-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patch (26.1K, 5-v1-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patch)
  download | inline diff:
From a7658376eb1461132627825f4deabb73a4e53d1d Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 11:27:49 +0800
Subject: [PATCH v1 4/9] Implement functions that create RelAggInfos if
 applicable

This commit implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier.
---
 src/backend/optimizer/path/equivclass.c |  26 +-
 src/backend/optimizer/plan/planmain.c   |   3 +
 src/backend/optimizer/util/relnode.c    | 624 ++++++++++++++++++++++++
 src/backend/utils/adt/selfuncs.c        |   5 +-
 src/include/nodes/pathnodes.h           |   6 +
 src/include/optimizer/pathnode.h        |   5 +
 src/include/optimizer/paths.h           |   3 +-
 7 files changed, 662 insertions(+), 10 deletions(-)

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 4bd60a09c6..1890dbb852 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2439,15 +2439,17 @@ find_join_domain(PlannerInfo *root, Relids relids)
  *	  Detect whether two expressions are known equal due to equivalence
  *	  relationships.
  *
- * Actually, this only shows that the expressions are equal according
- * to some opfamily's notion of equality --- but we only use it for
- * selectivity estimation, so a fuzzy idea of equality is OK.
+ * If opfamily is given, the expressions must be known equal per the semantics
+ * of that opfamily (note it has to be a btree opfamily, since those are the
+ * only opfamilies equivclass.c deals with).  If opfamily is InvalidOid, we'll
+ * return true if they're equal according to any opfamily, which is fuzzy but
+ * OK for estimation purposes.
  *
  * Note: does not bother to check for "equal(item1, item2)"; caller must
  * check that case if it's possible to pass identical items.
  */
 bool
-exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
+exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
 {
 	ListCell   *lc1;
 
@@ -2462,6 +2464,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
 		if (ec->ec_has_volatile)
 			continue;
 
+		/*
+		 * It's okay to consider ec_broken ECs here.  Brokenness just means we
+		 * couldn't derive all the implied clauses we'd have liked to; it does
+		 * not invalidate our knowledge that the members are equal.
+		 */
+
+		/* Ignore if this EC doesn't use specified opfamily */
+		if (OidIsValid(opfamily) &&
+			!list_member_oid(ec->ec_opfamilies, opfamily))
+			continue;
+
 		foreach(lc2, ec->ec_members)
 		{
 			EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2490,8 +2503,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
  * (In principle there might be more than one matching eclass if multiple
  * collations are involved, but since collation doesn't matter for equality,
  * we ignore that fine point here.)  This is much like exprs_known_equal,
- * except that we insist on the comparison operator matching the eclass, so
- * that the result is definite not approximate.
+ * except for the format of the input.
  *
  * On success, we also set fkinfo->eclass[colno] to the matching eclass,
  * and set fkinfo->fk_eclass_member[colno] to the eclass member for the
@@ -2532,7 +2544,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
 		/* Never match to a volatile EC */
 		if (ec->ec_has_volatile)
 			continue;
-		/* Note: it seems okay to match to "broken" eclasses here */
+		/* It's okay to consider "broken" ECs here, see exprs_known_equal */
 
 		foreach(lc2, ec->ec_members)
 		{
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 8b8def21ca..db66a3e189 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -68,6 +68,9 @@ query_planner(PlannerInfo *root,
 	root->join_rel_list = makeNode(RelInfoList);
 	root->join_rel_list->items = NIL;
 	root->join_rel_list->hash = NULL;
+	root->agg_info_list = makeNode(RelInfoList);
+	root->agg_info_list->items = NIL;
+	root->agg_info_list->hash = NULL;
 	root->join_rel_level = NULL;
 	root->join_cur_level = 0;
 	root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index c88da963db..e7f465ef7b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -87,6 +87,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
 									   RelOptInfo *childrel,
 									   int nappinfos,
 									   AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+													RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+								  PathTarget *target, PathTarget *agg_input,
+								  List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
 
 
 /*
@@ -640,6 +648,58 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
 	add_rel_info(root->join_rel_list, joinrel);
 }
 
+/*
+ * add_grouped_rel
+ *		Add grouped base or join relation to the list of grouped relations in
+ *		the given PlannerInfo.  Also add the corresponding RelAggInfo to
+ *		root->agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+	add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+	add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ *	  Returns grouped relation entry (base or join relation) corresponding to
+ *	  'relids' or NULL if none exists.
+ *
+ * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one
+ * exists) will be returned in *agg_info_p.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+	RelOptInfo *rel;
+
+	rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+									   relids);
+	if (rel == NULL)
+	{
+		if (agg_info_p)
+			*agg_info_p = NULL;
+
+		return NULL;
+	}
+
+	/* also return the corresponding RelAggInfo, if asked */
+	if (agg_info_p)
+	{
+		RelAggInfo *agg_info;
+
+		agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+		/* The relation exists, so the agg_info should be there too. */
+		Assert(agg_info != NULL);
+
+		*agg_info_p = agg_info;
+	}
+
+	return rel;
+}
+
 /*
  * set_foreign_rel_properties
  *		Set up foreign-join fields if outer and inner relation are foreign
@@ -2464,3 +2524,567 @@ build_child_join_reltarget(PlannerInfo *root,
 	childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
 	childrel->reltarget->width = parentrel->reltarget->width;
 }
+
+/*
+ * create_rel_agg_info
+ *	  Check if the given relation can produce grouped paths and return the
+ *	  information it'll need for it.  The given relation is the non-grouped one
+ *	  which has the reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+	ListCell   *lc;
+	RelAggInfo *result;
+	PathTarget *agg_input;
+	PathTarget *target;
+	List	   *grp_exprs_extra = NIL;
+	List	   *group_clauses_final;
+	int			i;
+
+	/*
+	 * The lists of aggregate expressions and grouping expressions should have
+	 * been constructed.
+	 */
+	Assert(root->agg_clause_list != NIL);
+	Assert(root->group_expr_list != NIL);
+
+	/*
+	 * If this is a child rel, the grouped rel for its parent rel must have
+	 * been created if it can.  So we can just use parent's RelAggInfo if there
+	 * is one, with appropriate variable substitutions.
+	 */
+	if (IS_OTHER_REL(rel))
+	{
+		RelOptInfo	   *rel_grouped;
+		RelAggInfo	   *agg_info;
+		Relids			parent_relids = NULL;
+		AppendRelInfo **appinfos;
+		int				nappinfos;
+		int				cnt;
+
+		appinfos = find_appinfos_by_relids(root, rel->relids, &nappinfos);
+
+		for (cnt = 0; cnt < nappinfos; cnt++)
+			parent_relids = bms_add_member(parent_relids,
+										   appinfos[cnt]->parent_relid);
+
+		Assert(!bms_is_empty(parent_relids));
+		rel_grouped = find_grouped_rel(root, parent_relids, &agg_info);
+
+		if (rel_grouped == NULL)
+			return NULL;
+
+		Assert(agg_info != NULL);
+
+		agg_info = (RelAggInfo *) adjust_appendrel_attrs(root,
+														 (Node *) agg_info,
+														 nappinfos,
+														 appinfos);
+
+		pfree(appinfos);
+
+		agg_info->input_rows = rel->rows;
+		agg_info->grouped_rows =
+			estimate_num_groups(root, agg_info->group_exprs,
+								agg_info->input_rows, NULL, NULL);
+
+		return agg_info;
+	}
+
+	/* Check if it's possible to produce grouped paths for this relation. */
+	if (!eager_aggregation_possible_for_relation(root, rel))
+		return NULL;
+
+	/*
+	 * Create targets for the grouped paths and for the input paths of the
+	 * grouped paths.
+	 */
+	target = create_empty_pathtarget();
+	agg_input = create_empty_pathtarget();
+
+	/* initialize 'target' and 'agg_input' */
+	if (!init_grouping_targets(root, rel, target, agg_input, &grp_exprs_extra))
+		return NULL;
+
+	/* Eager aggregation makes no sense w/o grouping expressions */
+	if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+		return NULL;
+
+	group_clauses_final = root->parse->groupClause;
+
+	/*
+	 * If the aggregation target should have extra grouping expressions (in
+	 * order to emit input vars for join conditions), add them now. This step
+	 * includes assignment of tleSortGroupRef's which we can generate now.
+	 */
+	if (list_length(grp_exprs_extra) > 0)
+	{
+		Index		sortgroupref;
+
+		/*
+		 * Make a copy of the group clauses as we'll need to add some more
+		 * clauses.
+		 */
+		group_clauses_final = list_copy(group_clauses_final);
+
+		/* find out the current max sortgroupref */
+		sortgroupref = 0;
+		foreach(lc, root->processed_tlist)
+		{
+			Index		ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+			if (ref > sortgroupref)
+				sortgroupref = ref;
+		}
+
+		/*
+		 * Generate the SortGroupClause's and add the expressions to the
+		 * target.
+		 */
+		foreach(lc, grp_exprs_extra)
+		{
+			Var		   *var = lfirst_node(Var, lc);
+			SortGroupClause *cl = makeNode(SortGroupClause);
+
+			/*
+			 * Initialize the SortGroupClause.
+			 *
+			 * As the final aggregation will not use this grouping expression,
+			 * we don't care whether sortop is < or >. The value of nulls_first
+			 * should not matter for the same reason.
+			 */
+			cl->tleSortGroupRef = ++sortgroupref;
+			get_sort_group_operators(var->vartype,
+									 false, true, false,
+									 &cl->sortop, &cl->eqop, NULL,
+									 &cl->hashable);
+			group_clauses_final = lappend(group_clauses_final, cl);
+			add_column_to_pathtarget(target, (Expr *) var,
+									 cl->tleSortGroupRef);
+
+			/*
+			 * The aggregation input target must emit this var too.
+			 */
+			add_column_to_pathtarget(agg_input, (Expr *) var,
+									 cl->tleSortGroupRef);
+		}
+	}
+
+	/*
+	 * Build a list of grouping expressions and a list of the corresponding
+	 * SortGroupClauses.
+	 */
+	i = 0;
+	result = makeNode(RelAggInfo);
+	foreach(lc, target->exprs)
+	{
+		Index		sortgroupref = 0;
+		SortGroupClause *cl;
+		Expr	   *texpr;
+
+		texpr = (Expr *) lfirst(lc);
+
+		Assert(IsA(texpr, Var));
+
+		sortgroupref = target->sortgrouprefs[i++];
+		if (sortgroupref == 0)
+			continue;
+
+		/* find the SortGroupClause in group_clauses_final */
+		cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+		/* do not add this SortGroupClause if it has already been added */
+		if (list_member(result->group_clauses, cl))
+			continue;
+
+		result->group_clauses = lappend(result->group_clauses, cl);
+		result->group_exprs = list_append_unique(result->group_exprs,
+												 texpr);
+	}
+
+	/*
+	 * Calculate pathkeys that represent this grouping requirements.
+	 */
+	result->group_pathkeys =
+		make_pathkeys_for_sortclauses(root, result->group_clauses,
+									  make_tlist_from_pathtarget(target));
+
+	/*
+	 * Add aggregates to the grouping target.
+	 */
+	foreach(lc, root->agg_clause_list)
+	{
+		AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+		Aggref   *aggref;
+
+		Assert(IsA(ac_info->aggref, Aggref));
+
+		aggref = (Aggref *) copyObject(ac_info->aggref);
+		mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+		add_column_to_pathtarget(target, (Expr *) aggref, 0);
+
+		result->agg_exprs = lappend(result->agg_exprs, aggref);
+	}
+
+	/*
+	 * Since neither target nor agg_input is supposed to be identical to the
+	 * source reltarget, compute the width and cost again.
+	 */
+	set_pathtarget_cost_width(root, target);
+	set_pathtarget_cost_width(root, agg_input);
+
+	result->relids = bms_copy(rel->relids);
+	result->target = target;
+	result->agg_input = agg_input;
+
+	/*
+	 * The number of aggregation input rows is simply the number of rows of the
+	 * non-grouped relation, which should have been estimated by now.
+	 */
+	result->input_rows = rel->rows;
+
+	/* Estimate the number of groups with equal grouped exprs. */
+	result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+											   result->input_rows, NULL, NULL);
+
+	return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * 	  Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+	ListCell   *lc;
+
+	/*
+	 * The current implementation of eager aggregation cannot handle
+	 * PlaceHolderVar (PHV).
+	 *
+	 * If we knew that the PHV should be evaluated in this target (and of
+	 * course, if its expression matched some Aggref argument), we'd just let
+	 * init_grouping_targets add that Aggref. On the other hand, if we knew
+	 * that the PHV is evaluated below the current rel, we could ignore it
+	 * because the referencing Aggref would take care of propagation of the
+	 * value to upper joins.
+	 *
+	 * The problem is that the same PHV can be evaluated in the target of the
+	 * current rel or in that of lower rel --- depending on the input paths.
+	 * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+	 * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+	 * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+	 */
+	foreach(lc, rel->reltarget->exprs)
+	{
+		Expr	   *expr = lfirst(lc);
+
+		if (IsA(expr, PlaceHolderVar))
+			return false;
+	}
+
+	if (IS_SIMPLE_REL(rel))
+	{
+		RangeTblEntry *rte = root->simple_rte_array[rel->relid];;
+
+		/*
+		 * rtekind != RTE_RELATION case is not supported yet.
+		 */
+		if (rte->rtekind != RTE_RELATION)
+			return false;
+	}
+
+	/* Caller should only pass base relations or joins. */
+	Assert(rel->reloptkind == RELOPT_BASEREL ||
+		   rel->reloptkind == RELOPT_JOINREL);
+
+	/*
+	 * Check if all aggregate expressions can be evaluated on this relation
+	 * level.
+	 */
+	foreach(lc, root->agg_clause_list)
+	{
+		AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+		Assert(IsA(ac_info->aggref, Aggref));
+
+		/*
+		 * Give up if any aggregate needs relations other than the current one.
+		 *
+		 * If the aggregate needs the current rel plus anything else, then the
+		 * problem is that grouping of the current relation could make some
+		 * input variables unavailable for the "higher aggregate", and it'd
+		 * also decrease the number of input rows the "higher aggregate"
+		 * receives.
+		 *
+		 * If the aggregate does not even need the current rel, then the
+		 * current rel should be grouped because we do not support join of two
+		 * grouped relations.
+		 */
+		if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+			return false;
+	}
+
+	return true;
+}
+
+/*
+ * init_grouping_targets
+ *	  Initialize target for grouped paths (target) as well as a target for
+ *	  paths that generate input for the grouped paths (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+					  PathTarget *target, PathTarget *agg_input,
+					  List **group_exprs_extra_p)
+{
+	ListCell   *lc;
+	List	   *possibly_dependent = NIL;
+
+	foreach(lc, rel->reltarget->exprs)
+	{
+		Expr	   *expr = (Expr *) lfirst(lc);
+		Index		sortgroupref;
+
+		/*
+		 * Given that PlaceHolderVar currently prevents us from doing eager
+		 * aggregation, the source target cannot contain anything more complex
+		 * than a Var.
+		 */
+		Assert(IsA(expr, Var));
+
+		/* Get the sortgroupref if the expr can act as grouping expression. */
+		sortgroupref = get_expression_sortgroupref(root, expr);
+		if (sortgroupref > 0)
+		{
+			/*
+			 * If the target expression can be used as the grouping key, it
+			 * should be emitted by the grouped paths that have been pushed
+			 * down to this relation level.
+			 */
+			add_column_to_pathtarget(target, expr, sortgroupref);
+
+			/*
+			 * ... and it also should be emitted by the input paths
+			 */
+			add_column_to_pathtarget(agg_input, expr, sortgroupref);
+		}
+		else
+		{
+			if (is_var_needed_by_join(root, (Var *) expr, rel))
+			{
+				/*
+				 * The variable is needed for a join, however it's neither in
+				 * the GROUP BY clause nor can it be derived from it using EC.
+				 * (Otherwise it would have to be added to the targets above.)
+				 * We need to construct special SortGroupClause for this
+				 * variable.
+				 *
+				 * Note that its tleSortGroupRef needs to be unique within
+				 * agg_input, so we need to postpone creation of the
+				 * SortGroupClause's until we're done with the iteration of
+				 * rel->reltarget->exprs. Also it makes sense for the caller to
+				 * do some more check before it starts to create those
+				 * SortGroupClause's.
+				 */
+				*group_exprs_extra_p = lappend(*group_exprs_extra_p, expr);
+			}
+			else if (is_var_in_aggref_only(root, (Var *) expr))
+			{
+				/*
+				 * Another reason we might need this variable is that some
+				 * aggregate pushed down to this relation references it. In
+				 * such a case, add it to "agg_input", but not to "target".
+				 * However, if the aggregate is not the only reason for the var
+				 * to be in the target, some more checks need to be performed
+				 * below.
+				 */
+				add_new_column_to_pathtarget(agg_input, expr);
+			}
+			else
+			{
+				/*
+				 * The Var can be functionally dependent on another expression
+				 * of the target, but we cannot check that until we've built
+				 * all the expressions for the target.
+				 */
+				possibly_dependent = lappend(possibly_dependent, expr);
+			}
+		}
+	}
+
+	/*
+	 * Now we can check whether the expression is functionally dependent on
+	 * another one.
+	 */
+	foreach(lc, possibly_dependent)
+	{
+		Var		   *tvar;
+		List	   *deps = NIL;
+		RangeTblEntry *rte;
+
+		tvar = lfirst_node(Var, lc);
+		rte = root->simple_rte_array[tvar->varno];
+
+		/*
+		 * Check if the Var can be in the grouping key even though it's not
+		 * mentioned by the GROUP BY clause (and could not be derived using
+		 * ECs).
+		 */
+		if (check_functional_grouping(rte->relid, tvar->varno,
+									  tvar->varlevelsup,
+									  target->exprs, &deps))
+		{
+			/*
+			 * The var shouldn't be actually used for grouping key evaluation
+			 * (instead, the one this depends on will be), so sortgroupref
+			 * should not be important.
+			 */
+			add_new_column_to_pathtarget(target, (Expr *) tvar);
+			add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+		}
+		else
+		{
+			/*
+			 * As long as the query is semantically correct, arriving here
+			 * means that the var is referenced by a generic grouping
+			 * expression but not referenced by any join.
+			 *
+			 * If the eager aggregation will support generic grouping
+			 * expression in the future, create_rel_agg_info() will have to add
+			 * this variable to "agg_input" target and also add the whole
+			 * generic expression to "target".
+			 */
+			return false;
+		}
+	}
+
+	return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ *	  Check whether given Var appears in Aggref(s) which we consider usable at
+ *	  relation / join level, and only in the Aggref(s).
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+	ListCell   *lc;
+
+	foreach(lc, root->agg_clause_list)
+	{
+		AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+		Assert(IsA(ac_info->aggref, Aggref));
+
+		if (bms_is_member(var->varno, ac_info->agg_eval_at))
+			break;;
+	}
+
+	/* No aggregate references the Var? */
+	if (lc == NULL)
+		return false;
+
+	/* Does the Var appear in the target outside aggregates? */
+	foreach(lc, root->processed_tlist)
+	{
+		TargetEntry *tle = lfirst_node(TargetEntry, lc);
+		List	   *vars;
+
+		if (IsA(tle->expr, Aggref))
+			continue;
+
+		vars = pull_var_clause((Node *) tle->expr,
+							   PVC_RECURSE_WINDOWFUNCS |
+							   PVC_RECURSE_PLACEHOLDERS);
+		if (list_member(vars, var))
+		{
+			list_free(vars);
+			return false;
+		}
+
+		list_free(vars);
+	}
+
+	/* The Var is in aggregate(s) and only there. */
+	return true;
+}
+
+/*
+ * is_var_needed_by_join
+ *	  Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ *    SELECT a.i, avg(b.y)
+ *    FROM a JOIN b ON a.j = b.j
+ *    GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+	Relids		relids;
+	int			attno;
+	RelOptInfo *baserel;
+
+	/*
+	 * Note that when we are checking if the Var is needed by joins above, we
+	 * want to exclude the situation where the Var is only needed in final
+	 * output.  So include "relation 0" here.
+	 */
+	relids = bms_copy(rel->relids);
+	relids = bms_add_member(relids, 0);
+
+	baserel = find_base_rel(root, var->varno);
+	attno = var->varattno - baserel->min_attr;
+
+
+	return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ *	  Return sortgroupref if the given 'expr' can be used as a grouping
+ *	  expression in grouped paths for base or join relations, or 0 otherwise.
+ *
+ * Note that we also need to check if the 'expr' is known equal to other exprs
+ * due to equivalence relationships that can act as grouping expressions.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+	ListCell   *lc;
+
+	foreach(lc, root->group_expr_list)
+	{
+		GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+		Assert(IsA(ge_info->expr, Var));
+
+		if (equal(ge_info->expr, expr) ||
+			exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+							  ge_info->btree_opfamily))
+		{
+			Assert(ge_info->sortgroupref > 0);
+
+			return ge_info->sortgroupref;
+		}
+	}
+
+	/* The expression cannot be used as grouping key. */
+	return 0;
+}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cea777e9d4..d1365229f7 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3313,10 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
 
 		/*
 		 * Drop known-equal vars, but only if they belong to different
-		 * relations (see comments for estimate_num_groups)
+		 * relations (see comments for estimate_num_groups).  We aren't too
+		 * fussy about the semantics of "equal" here.
 		 */
 		if (vardata->rel != varinfo->rel &&
-			exprs_known_equal(root, var, varinfo->var))
+			exprs_known_equal(root, var, varinfo->var, InvalidOid))
 		{
 			if (varinfo->ndistinct <= ndistinct)
 			{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 69ed9eb1f6..3ef5195323 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -429,6 +429,12 @@ struct PlannerInfo
 	 */
 	RelInfoList		upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
 
+	/*
+	 * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+	 * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+	 */
+	RelInfoList *agg_info_list;
+
 	/* Result tlists chosen by grouping_planner for upper-stage processing */
 	struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
 
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index c43d97b48a..8d03ce2c57 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -310,6 +310,10 @@ extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
 extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
 extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
 extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+							RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+									RelAggInfo **agg_info_p);
 extern RelOptInfo *build_join_rel(PlannerInfo *root,
 								  Relids joinrelids,
 								  RelOptInfo *outer_rel,
@@ -344,4 +348,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
 										RelOptInfo *parent_joinrel, List *restrictlist,
 										SpecialJoinInfo *sjinfo);
 
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
 #endif							/* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 040a047b81..dcea10888b 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -160,7 +160,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
 													  Relids join_relids,
 													  Relids outer_relids,
 													  RelOptInfo *inner_rel);
-extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
+extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
+							  Oid opfamily);
 extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
 														   ForeignKeyOptInfo *fkinfo,
 														   int colno);
-- 
2.31.0



  [application/octet-stream] v1-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patch (7.8K, 6-v1-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patch)
  download | inline diff:
From efad6c39e247078c6d3cdf3cf8561bd5d35004e6 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 11:12:18 +0800
Subject: [PATCH v1 2/9] Introduce RelAggInfo structure to store info for
 grouped paths.

This commit introduces RelAggInfo structure to store information needed
to create grouped paths for base and join rels.  It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.
---
 src/backend/optimizer/util/relnode.c | 66 +++++++++++++++++--------
 src/include/nodes/pathnodes.h        | 73 ++++++++++++++++++++++++++++
 2 files changed, 118 insertions(+), 21 deletions(-)

diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 9e25750acd..c88da963db 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -36,13 +36,13 @@
 
 
 /*
- * An entry of a hash table that we use to make lookup for RelOptInfo
- * structures more efficient.
+ * An entry of a hash table that we use to make lookup for RelOptInfo or
+ * RelAggInfo structures more efficient.
  */
 typedef struct RelInfoEntry
 {
 	Relids		relids;			/* hash key --- MUST BE FIRST */
-	RelOptInfo *rel;
+	void	   *data;
 } RelInfoEntry;
 
 static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
@@ -477,7 +477,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
 
 /*
  * build_rel_hash
- *	  Construct the auxiliary hash table for relations.
+ *	  Construct the auxiliary hash table for relation specific data.
  */
 static void
 build_rel_hash(RelInfoList *list)
@@ -497,19 +497,27 @@ build_rel_hash(RelInfoList *list)
 						  &hash_ctl,
 						  HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
 
-	/* Insert all the already-existing relations */
+	/* Insert all the already-existing relation specific infos */
 	foreach(l, list->items)
 	{
-		RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+		void	   *item = lfirst(l);
 		RelInfoEntry *hentry;
 		bool		found;
+		Relids		relids;
+
+		Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+		if (IsA(item, RelOptInfo))
+			relids = ((RelOptInfo *) item)->relids;
+		else
+			relids = ((RelAggInfo *) item)->relids;
 
 		hentry = (RelInfoEntry *) hash_search(hashtab,
-											  &(rel->relids),
+											  &relids,
 											  HASH_ENTER,
 											  &found);
 		Assert(!found);
-		hentry->rel = rel;
+		hentry->data = item;
 	}
 
 	list->hash = hashtab;
@@ -517,9 +525,9 @@ build_rel_hash(RelInfoList *list)
 
 /*
  * find_rel_info
- *	  Find an RelOptInfo entry.
+ *	  Find an RelOptInfo or a RelAggInfo entry.
  */
-static RelOptInfo *
+static void *
 find_rel_info(RelInfoList *list, Relids relids)
 {
 	if (list == NULL)
@@ -550,7 +558,7 @@ find_rel_info(RelInfoList *list, Relids relids)
 											  HASH_FIND,
 											  NULL);
 		if (hentry)
-			return hentry->rel;
+			return hentry->data;
 	}
 	else
 	{
@@ -558,10 +566,18 @@ find_rel_info(RelInfoList *list, Relids relids)
 
 		foreach(l, list->items)
 		{
-			RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+			void	   *item = lfirst(l);
+			Relids		item_relids = NULL;
+
+			Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
 
-			if (bms_equal(rel->relids, relids))
-				return rel;
+			if (IsA(item, RelOptInfo))
+				item_relids = ((RelOptInfo *) item)->relids;
+			else if (IsA(item, RelAggInfo))
+				item_relids = ((RelAggInfo *) item)->relids;
+
+			if (bms_equal(item_relids, relids))
+				return item;
 		}
 	}
 
@@ -576,32 +592,40 @@ find_rel_info(RelInfoList *list, Relids relids)
 RelOptInfo *
 find_join_rel(PlannerInfo *root, Relids relids)
 {
-	return find_rel_info(root->join_rel_list, relids);
+	return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
 }
 
 /*
  * add_rel_info
- *		Add given relation to the given list. Also add it to the auxiliary
+ *		Add relation specific info to a list, and also add it to the auxiliary
  *		hashtable if there is one.
  */
 static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
 {
+	Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
 	/* GEQO requires us to append the new relation to the end of the list! */
-	list->items = lappend(list->items, rel);
+	list->items = lappend(list->items, data);
 
 	/* store it into the auxiliary hashtable if there is one. */
 	if (list->hash)
 	{
+		Relids		relids;
 		RelInfoEntry *hentry;
 		bool		found;
 
+		if (IsA(data, RelOptInfo))
+			relids = ((RelOptInfo *) data)->relids;
+		else
+			relids = ((RelAggInfo *) data)->relids;
+
 		hentry = (RelInfoEntry *) hash_search(list->hash,
-											  &(rel->relids),
+											  &relids,
 											  HASH_ENTER,
 											  &found);
 		Assert(!found);
-		hentry->rel = rel;
+		hentry->data = data;
 	}
 }
 
@@ -1496,7 +1520,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
 	/* If we already made this upperrel for the query, return it */
 	if (list)
 	{
-		upperrel = find_rel_info(list, relids);
+		upperrel = (RelOptInfo *) find_rel_info(list, relids);
 		if (upperrel)
 			return upperrel;
 	}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index be51e2c652..d67f725ad6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1065,6 +1065,79 @@ typedef struct RelOptInfo
 	((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
 	 (rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
 
+/*
+ * RelAggInfo
+ *		Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes), just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to join grouped path to a non-grouped one.  This target can
+ * contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "grouped_rows" is the estimated number of result rows of the AggPath.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ */
+typedef struct RelAggInfo
+{
+	pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+	NodeTag		type;
+
+	/*
+	 * the same as in RelOptInfo; set of base + OJ relids (rangetable indexes)
+	 */
+	Relids		relids;
+
+	/*
+	 * the targetlist for Paths scanning this grouped rel; list of Vars/Exprs,
+	 * cost, width
+	 */
+	struct PathTarget *target;
+
+	/*
+	 * the targetlist for Paths that generate input for the grouped paths 
+	 */
+	struct PathTarget *agg_input;
+
+	/* estimated number of input tuples for the grouped paths */
+	Cardinality		input_rows;
+
+	/* estimated number of result tuples of the grouped relation*/
+	Cardinality		grouped_rows;
+
+	/* a list of SortGroupClause's */
+	List	   *group_clauses;
+	/* a list of grouping expressions */
+	List	   *group_exprs;
+	/* a list of PathKeys */
+	List	   *group_pathkeys;
+
+	/* a list of Aggref nodes */
+	List	   *agg_exprs;
+} RelAggInfo;
+
 /*
  * IndexOptInfo
  *		Per-index information for planning/optimization
-- 
2.31.0



  [application/octet-stream] v1-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patch (14.3K, 7-v1-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patch)
  download | inline diff:
From 9798f1f4e4d1e6aef6b712df452fc5f14e736292 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 18:40:46 +0800
Subject: [PATCH v1 3/9] Set up for eager aggregation by collecting needed
 infos

This commit checks if eager aggregation is applicable, and if so, sets
up root->agg_clause_list and root->group_expr_list by collecting
suitable aggregate expressions and grouping expressions in the query.
---
 src/backend/optimizer/path/allpaths.c         |   1 +
 src/backend/optimizer/plan/initsplan.c        | 250 ++++++++++++++++++
 src/backend/optimizer/plan/planmain.c         |   8 +
 src/backend/utils/misc/guc_tables.c           |  10 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/nodes/pathnodes.h                 |  41 +++
 src/include/optimizer/paths.h                 |   1 +
 src/include/optimizer/planmain.h              |   1 +
 src/test/regress/expected/sysviews.out        |   3 +-
 9 files changed, 315 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 351bf2e9e4..9384c54ed9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -80,6 +80,7 @@ typedef enum pushdown_safe_type
 
 /* These parameters are set by GUC */
 bool		enable_geqo = false;	/* just in case GUC doesn't set it */
+bool		enable_eager_aggregate = false;
 int			geqo_threshold;
 int			min_parallel_table_scan_size;
 int			min_parallel_index_scan_size;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index d4a9d77d7f..36c82bd696 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
  */
 #include "postgres.h"
 
+#include "access/nbtree.h"
 #include "catalog/pg_class.h"
 #include "catalog/pg_type.h"
 #include "nodes/makefuncs.h"
@@ -81,6 +82,8 @@ typedef struct JoinTreeItem
 } JoinTreeItem;
 
 
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
 static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
 									   Index rtindex);
 static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -328,6 +331,253 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
 	}
 }
 
+/*
+ * setup_eager_aggregation
+ *	  Check if eager aggregation is applicable, and if so collect suitable
+ *	  aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+	/*
+	 * Don't apply eager aggregation if disabled by user.
+	 */
+	if (!enable_eager_aggregate)
+		return;
+
+	/*
+	 * Don't apply eager aggregation if there are no GROUP BY clauses.
+	 */
+	if (!root->parse->groupClause)
+		return;
+
+	/*
+	 * For now we don't try to support grouping sets.
+	 */
+	if (root->parse->groupingSets)
+		return;
+
+	/*
+	 * For now we don't try to support DISTINCT or ORDER BY aggregates.
+	 */
+	if (root->numOrderedAggs > 0)
+		return;
+
+	/*
+	 * If there are any aggregates that do not support partial mode, or any
+	 * partial aggregates that are non-serializable, do not apply eager
+	 * aggregation.
+	 */
+	if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+		return;
+
+	/*
+	 * SRF is not allowed in the aggregate argument and we don't even want it
+	 * in the GROUP BY clause, so forbid it in general. It needs to be
+	 * analyzed if evaluation of a GROUP BY clause containing SRF below the
+	 * query targetlist would be correct. Currently it does not seem to be an
+	 * important use case.
+	 */
+	if (root->parse->hasTargetSRFs)
+		return;
+
+	/*
+	 * Collect aggregate expressions that appear in targetlist and having
+	 * clauses.
+	 */
+	create_agg_clause_infos(root);
+
+	/*
+	 * If there are no suitable aggregate expressions, we cannot apply eager
+	 * aggregation.
+	 */
+	if (root->agg_clause_list == NIL)
+		return;
+
+	/*
+	 * Collect grouping expressions that appear in grouping clauses.
+	 */
+	create_grouping_expr_infos(root);
+}
+
+/*
+ * Create AggClauseInfo for each aggregate.
+ *
+ * If any aggregate is not suitable, set root->agg_clause_list to NIL and
+ * return.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+	List	   *tlist_exprs;
+	ListCell   *lc;
+
+	Assert(root->agg_clause_list == NIL);
+
+	tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+								  PVC_INCLUDE_AGGREGATES |
+								  PVC_RECURSE_WINDOWFUNCS |
+								  PVC_RECURSE_PLACEHOLDERS);
+
+	/*
+	 * For now we don't try to support GROUPING() expressions.
+	 */
+	foreach(lc, tlist_exprs)
+	{
+		Expr	   *expr = (Expr *) lfirst(lc);
+
+		if (IsA(expr, GroupingFunc))
+			return;
+	}
+
+	/*
+	 * Aggregates within the HAVING clause need to be processed in the same way
+	 * as those in the targetlist.  Note that HAVING can contain Aggrefs but
+	 * not WindowFuncs.
+	 */
+	if (root->parse->havingQual != NULL)
+	{
+		List	   *having_exprs;
+
+		having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+									   PVC_INCLUDE_AGGREGATES |
+									   PVC_RECURSE_PLACEHOLDERS);
+		if (having_exprs != NIL)
+		{
+			tlist_exprs = list_concat(tlist_exprs, having_exprs);
+			list_free(having_exprs);
+		}
+	}
+
+	foreach(lc, tlist_exprs)
+	{
+		Expr	   *expr = (Expr *) lfirst(lc);
+		Aggref	   *aggref;
+		AggClauseInfo *ac_info;
+
+		/*
+		 * tlist_exprs may also contain Vars, but we only need Aggrefs.
+		 */
+		if (IsA(expr, Var))
+			continue;
+
+		aggref = castNode(Aggref, expr);
+
+		Assert(aggref->aggorder == NIL);
+		Assert(aggref->aggdistinct == NIL);
+
+		ac_info = makeNode(AggClauseInfo);
+		ac_info->aggref = aggref;
+		ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+		root->agg_clause_list =
+			list_append_unique(root->agg_clause_list, ac_info);
+	}
+
+	list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, set root->group_expr_list to NIL
+ * and return.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+	List	   *exprs = NIL;
+	List	   *sortgrouprefs = NIL;
+	List	   *btree_opfamilies = NIL;
+	ListCell   *lc,
+			   *lc1,
+			   *lc2,
+			   *lc3;
+
+	Assert(root->group_expr_list == NIL);
+
+	foreach(lc, root->parse->groupClause)
+	{
+		SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+		TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+		TypeCacheEntry *tce;
+		Oid				equalimageproc;
+		Oid				eq_op;
+		List		   *eq_opfamilies;
+		Oid				btree_opfamily;
+
+		Assert(tle->ressortgroupref > 0);
+
+		/*
+		 * For now we only support plain Vars as grouping expressions.
+		 */
+		if (!IsA(tle->expr, Var))
+			return;
+
+		/*
+		 * Eager aggregation is only possible if equality of grouping keys
+		 * per the equality operator implies bitwise equality. Otherwise, if
+		 * we put keys of different byte images into the same group, we lose
+		 * some information that may be needed to evaluate join clauses above
+		 * the pushed-down aggregate node, or the WHERE clause.
+		 *
+		 * For example, the NUMERIC data type is not supported because values
+		 * that fall into the same group according to the equality operator
+		 * (e.g. 0 and 0.0) can have different scale.
+		 */
+		tce = lookup_type_cache(exprType((Node *) tle->expr),
+								TYPECACHE_BTREE_OPFAMILY);
+		if (!OidIsValid(tce->btree_opf) ||
+			!OidIsValid(tce->btree_opintype))
+			return;
+
+		equalimageproc = get_opfamily_proc(tce->btree_opf,
+										   tce->btree_opintype,
+										   tce->btree_opintype,
+										   BTEQUALIMAGE_PROC);
+		if (!OidIsValid(equalimageproc) ||
+			!DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+											   tce->typcollation,
+											   ObjectIdGetDatum(tce->btree_opintype))))
+			return;
+
+		/*
+		 * Get the operator in the btree's opfamily.
+		 */
+		eq_op = get_opfamily_member(tce->btree_opf,
+									tce->btree_opintype,
+									tce->btree_opintype,
+									BTEqualStrategyNumber);
+		if (!OidIsValid(eq_op))
+			return;
+		eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+		if (!eq_opfamilies)
+			return;
+		btree_opfamily = linitial_oid(eq_opfamilies);
+
+		exprs = lappend(exprs, tle->expr);
+		sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+		btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+	}
+
+	/*
+	 * Construct GroupExprInfo for each expression.
+	 */
+	forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+	{
+		Expr	   *expr = (Expr *) lfirst(lc1);
+		int			sortgroupref = lfirst_int(lc2);
+		Oid			btree_opfamily = lfirst_oid(lc3);
+		GroupExprInfo *ge_info;
+
+		ge_info = makeNode(GroupExprInfo);
+		ge_info->expr = (Expr *) copyObject(expr);
+		ge_info->sortgroupref = sortgroupref;
+		ge_info->btree_opfamily = btree_opfamily;
+
+		root->group_expr_list = lappend(root->group_expr_list, ge_info);
+	}
+}
 
 /*****************************************************************************
  *
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 3341e64d2b..8b8def21ca 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -78,6 +78,8 @@ query_planner(PlannerInfo *root,
 	root->placeholder_list = NIL;
 	root->placeholder_array = NULL;
 	root->placeholder_array_size = 0;
+	root->agg_clause_list = NIL;
+	root->group_expr_list = NIL;
 	root->fkey_list = NIL;
 	root->initial_rels = NIL;
 
@@ -264,6 +266,12 @@ query_planner(PlannerInfo *root,
 	 */
 	extract_restriction_or_clauses(root);
 
+	/*
+	 * Check if eager aggregation is applicable, and if so, set up
+	 * root->agg_clause_list and root->group_expr_list.
+	 */
+	setup_eager_aggregation(root);
+
 	/*
 	 * Now expand appendrels by adding "otherrels" for their children.  We
 	 * delay this to the end so that we have as much information as possible
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 527a2b2734..515e6d7737 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -984,6 +984,16 @@ struct config_bool ConfigureNamesBool[] =
 		false,
 		NULL, NULL, NULL
 	},
+	{
+		{"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+			gettext_noop("Enables eager aggregation."),
+			NULL,
+			GUC_EXPLAIN
+		},
+		&enable_eager_aggregate,
+		false,
+		NULL, NULL, NULL
+	},
 	{
 		{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
 			gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c97f9a25f0..f841915482 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -401,6 +401,7 @@
 #enable_sort = on
 #enable_tidscan = on
 #enable_group_by_reordering = on
+#enable_eager_aggregate = off
 
 # - Planner Cost Constants -
 
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d67f725ad6..69ed9eb1f6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -383,6 +383,12 @@ struct PlannerInfo
 	/* list of PlaceHolderInfos */
 	List	   *placeholder_list;
 
+	/* list of AggClauseInfos */
+	List	   *agg_clause_list;
+
+	/* List of GroupExprInfos */
+	List	   *group_expr_list;
+
 	/* array of PlaceHolderInfos indexed by phid */
 	struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
 	/* allocated size of array */
@@ -3193,6 +3199,41 @@ typedef struct MinMaxAggInfo
 	Param	   *param;
 } MinMaxAggInfo;
 
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+	pg_node_attr(no_read, no_query_jumble)
+
+	NodeTag		type;
+
+	/* the Aggref expr */
+	Aggref	   *aggref;
+
+	/* lowest level we can evaluate this aggregate at */
+	Relids		agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+	pg_node_attr(no_read, no_query_jumble)
+
+	NodeTag		type;
+
+	/* the represented expression */
+	Expr	   *expr;
+
+	/* the tleSortGroupRef of the corresponding SortGroupClause */
+	Index		sortgroupref;
+
+	/* btree opfamily defining the ordering */
+	Oid			btree_opfamily;
+} GroupExprInfo;
+
 /*
  * At runtime, PARAM_EXEC slots are used to pass values around from one plan
  * node to another.  They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 0e8a9c94ba..040a047b81 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
  * allpaths.c
  */
 extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
 extern PGDLLIMPORT int geqo_threshold;
 extern PGDLLIMPORT int min_parallel_table_scan_size;
 extern PGDLLIMPORT int min_parallel_index_scan_size;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index f2e3fa4c2e..42e0f37859 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -73,6 +73,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
 extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
 extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
 								   Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
 extern void find_lateral_references(PlannerInfo *root);
 extern void create_lateral_join_info(PlannerInfo *root);
 extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 9be7aca2b8..a83a41b0f8 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
 --------------------------------+---------
  enable_async_append            | on
  enable_bitmapscan              | on
+ enable_eager_aggregate         | off
  enable_gathermerge             | on
  enable_group_by_reordering     | on
  enable_hashagg                 | on
@@ -134,7 +135,7 @@ select name, setting from pg_settings where name like 'enable%';
  enable_seqscan                 | on
  enable_sort                    | on
  enable_tidscan                 | on
-(23 rows)
+(24 rows)
 
 -- There are always wait event descriptions for various types.
 select type, count(*) > 0 as ok FROM pg_wait_events
-- 
2.31.0



  [application/octet-stream] v1-0006-Build-grouped-relations-out-of-base-relations.patch (9.0K, 8-v1-0006-Build-grouped-relations-out-of-base-relations.patch)
  download | inline diff:
From 4d5639555cb14fa74f20e61ba79c155ec9be8b23 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Wed, 28 Feb 2024 10:03:41 +0800
Subject: [PATCH v1 6/9] Build grouped relations out of base relations

This commit builds grouped relations for each base relation if possible,
and generates aggregation paths for the grouped base relations.
---
 src/backend/optimizer/path/allpaths.c |  91 +++++++++++++++++++++++
 src/backend/optimizer/util/relnode.c  | 101 ++++++++++++++++++++++++++
 src/include/optimizer/pathnode.h      |   4 +
 3 files changed, 196 insertions(+)

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f47ad04846..ea2341d110 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -96,6 +96,7 @@ join_search_hook_type join_search_hook = NULL;
 
 static void set_base_rel_consider_startup(PlannerInfo *root);
 static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
 static void set_base_rel_pathlists(PlannerInfo *root);
 static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
 						 Index rti, RangeTblEntry *rte);
@@ -120,6 +121,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
 								Index rti, RangeTblEntry *rte);
 static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 									Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
 static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
 										 List *live_childrels,
 										 List *all_child_pathkeys);
@@ -188,6 +190,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
 	 */
 	set_base_rel_sizes(root);
 
+	/*
+	 * Build grouped base relations for each base rel if possible.
+	 */
+	setup_base_grouped_rels(root);
+
 	/*
 	 * We should now have size estimates for every actual table involved in
 	 * the query, and we also know which if any have been deleted from the
@@ -329,6 +336,59 @@ set_base_rel_sizes(PlannerInfo *root)
 	}
 }
 
+/*
+ * setup_base_grouped_rels
+ *	  For each "plain" base relation build a grouped base relation if eager
+ *	  aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+	Index		rti;
+
+	/*
+	 * If there are no aggregate expressions or grouping expressions, eager
+	 * aggregation is not possible.
+	 */
+	if (root->agg_clause_list == NIL ||
+		root->group_expr_list == NIL)
+		return;
+
+	/*
+	 * Eager aggregation only makes sense if there are multiple base rels in
+	 * the query.
+	 */
+	if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+		return;
+
+	for (rti = 1; rti < root->simple_rel_array_size; rti++)
+	{
+		RelOptInfo	   *rel = root->simple_rel_array[rti];
+		RelOptInfo	   *rel_grouped;
+		RelAggInfo	   *agg_info;
+
+		/* there may be empty slots corresponding to non-baserel RTEs */
+		if (rel == NULL)
+			continue;
+
+		Assert(rel->relid == rti); /* sanity check on array */
+
+		/*
+		 * Ignore RTEs that are not simple rels.  Note that we need to consider
+		 * "other rels" here.
+		 */
+		if (!IS_SIMPLE_REL(rel))
+			continue;
+
+		rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info);
+		if (rel_grouped)
+		{
+			/* Make the grouped relation available for joining. */
+			add_grouped_rel(root, rel_grouped, agg_info);
+		}
+	}
+}
+
 /*
  * set_base_rel_pathlists
  *	  Finds all paths available for scanning each base-relation entry.
@@ -565,6 +625,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	/* Now find the cheapest of the paths for this rel */
 	set_cheapest(rel);
 
+	/*
+	 * If a grouped relation for this rel exists, build partial aggregation
+	 * paths for it.
+	 *
+	 * Note that this can only happen after we've called set_cheapest() for
+	 * this base rel, because we need its cheapest paths.
+	 */
+	set_grouped_rel_pathlist(root, rel);
+
 #ifdef OPTIMIZER_DEBUG
 	pprint(rel);
 #endif
@@ -1292,6 +1361,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
 	add_paths_to_append_rel(root, rel, live_childrels);
 }
 
+/*
+ * set_grouped_rel_pathlist
+ *	  If a grouped relation for the given 'rel' exists, build partial
+ *	  aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+	RelOptInfo   *rel_grouped;
+	RelAggInfo   *agg_info;
+
+	/* Add paths to the grouped base relation if one exists. */
+	rel_grouped = find_grouped_rel(root, rel->relids,
+								   &agg_info);
+	if (rel_grouped)
+	{
+		generate_grouped_paths(root, rel_grouped, rel,
+							   agg_info);
+		set_cheapest(rel_grouped);
+	}
+}
+
 
 /*
  * add_paths_to_append_rel
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e7f465ef7b..83cdbb38bc 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
 
 #include <limits.h>
 
+#include "catalog/pg_constraint.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
@@ -27,12 +28,15 @@
 #include "optimizer/paths.h"
 #include "optimizer/placeholder.h"
 #include "optimizer/plancat.h"
+#include "optimizer/planner.h"
 #include "optimizer/restrictinfo.h"
 #include "optimizer/tlist.h"
 #include "rewrite/rewriteManip.h"
+#include "parser/parse_oper.h"
 #include "parser/parse_relation.h"
 #include "utils/hsearch.h"
 #include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
 
 
 /*
@@ -411,6 +415,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
 	return rel;
 }
 
+/*
+ * build_simple_grouped_rel
+ *	  Construct a new RelOptInfo for a grouped base relation out of an existing
+ *	  non-grouped base relation.
+ *
+ * On success, the new RelOptInfo is returned and the corresponding RelAggInfo
+ * is stored in *agg_info_p.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+						 RelAggInfo **agg_info_p)
+{
+	RelOptInfo	   *rel_plain;
+	RelOptInfo	   *rel_grouped;
+	RelAggInfo	   *agg_info;
+
+	/*
+	 * We should have available aggregate expressions and grouping expressions,
+	 * otherwise we cannot reach here.
+	 */
+	Assert(root->agg_clause_list != NIL);
+	Assert(root->group_expr_list != NIL);
+
+	rel_plain = root->simple_rel_array[relid];
+	Assert(rel_plain != NULL);
+	Assert(IS_SIMPLE_REL(rel_plain));
+
+	/* nothing to do for dummy rel */
+	if (IS_DUMMY_REL(rel_plain))
+		return NULL;
+
+	/*
+	 * Prepare the information we need to create grouped paths for this base
+	 * relation.
+	 */
+	agg_info = create_rel_agg_info(root, rel_plain);
+	if (agg_info == NULL)
+		return NULL;
+
+	/* build a grouped relation out of the plain relation */
+	rel_grouped = build_grouped_rel(root, rel_plain);
+	rel_grouped->reltarget = agg_info->target;
+	rel_grouped->rows = agg_info->grouped_rows;
+
+	/* return the RelAggInfo structure */
+	*agg_info_p = agg_info;
+
+	return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ *	  Build a grouped relation by flat copying a plain relation and resetting
+ *	  the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+	RelOptInfo   *rel_grouped;
+
+	rel_grouped = makeNode(RelOptInfo);
+	memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+	/*
+	 * clear path info
+	 */
+	rel_grouped->pathlist = NIL;
+	rel_grouped->ppilist = NIL;
+	rel_grouped->partial_pathlist = NIL;
+	rel_grouped->cheapest_startup_path = NULL;
+	rel_grouped->cheapest_total_path = NULL;
+	rel_grouped->cheapest_unique_path = NULL;
+	rel_grouped->cheapest_parameterized_paths = NIL;
+
+	/*
+	 * clear partition info
+	 */
+	rel_grouped->part_scheme = NULL;
+	rel_grouped->nparts = -1;
+	rel_grouped->boundinfo = NULL;
+	rel_grouped->partbounds_merged = false;
+	rel_grouped->partition_qual = NIL;
+	rel_grouped->part_rels = NULL;
+	rel_grouped->live_parts = NULL;
+	rel_grouped->all_partrels = NULL;
+	rel_grouped->partexprs = NULL;
+	rel_grouped->nullable_partexprs = NULL;
+	rel_grouped->consider_partitionwise_join = false;
+
+	/*
+	 * clear size estimates
+	 */
+	rel_grouped->rows = 0;
+
+	return rel_grouped;
+}
+
 /*
  * find_base_rel
  *	  Find a base or otherrel relation entry, which must already exist.
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 8d03ce2c57..6b856a5e77 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -306,6 +306,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
 extern void expand_planner_arrays(PlannerInfo *root, int add_size);
 extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
 									RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+											RelAggInfo **agg_info_p);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+									 RelOptInfo *rel_plain);
 extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
 extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
 extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
-- 
2.31.0



  [application/octet-stream] v1-0007-Build-grouped-relations-out-of-join-relations.patch (19.3K, 9-v1-0007-Build-grouped-relations-out-of-join-relations.patch)
  download | inline diff:
From 429cab42ee94a88eef79dfb3575ded35b8056a1c Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 13:33:09 +0800
Subject: [PATCH v1 7/9] Build grouped relations out of join relations

This commit builds grouped relations for each just-processed join
relation if possible, and generates aggregation paths for the grouped
join relations.

If we are joining rel1 and rel2, the aggregation paths for the grouped
join relation are generated by 1) joining the grouped paths of rel1 to
the plain paths of rel2, or joining the grouped paths of rel2 to the
plain paths of rel1, and 2) adding sorted and hashed partial aggregation
paths on top of paths of the plain join rel except for the topmost join
rel.

This commit also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that.

This commit also makes eager aggregation work for partitionwise join and
for geqo.

Starting from this commit, you should be able to see plans with eager
aggregation.
---
 src/backend/optimizer/geqo/geqo_eval.c  |  84 +++++++++++++----
 src/backend/optimizer/path/allpaths.c   |  48 ++++++++++
 src/backend/optimizer/path/joinrels.c   | 115 ++++++++++++++++++++++++
 src/backend/optimizer/plan/planner.c    |  35 ++++++--
 src/backend/optimizer/util/appendinfo.c |  64 +++++++++++++
 5 files changed, 320 insertions(+), 26 deletions(-)

diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 1141156899..278857d767 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -60,8 +60,12 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
 	MemoryContext oldcxt;
 	RelOptInfo *joinrel;
 	Cost		fitness;
-	int			savelength;
-	struct HTAB *savehash;
+	int			savelength_join_rel;
+	struct HTAB *savehash_join_rel;
+	int			savelength_grouped_rel;
+	struct HTAB *savehash_grouped_rel;
+	int			savelength_grouped_info;
+	struct HTAB *savehash_grouped_info;
 
 	/*
 	 * Create a private memory context that will hold all temp storage
@@ -78,25 +82,38 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
 	oldcxt = MemoryContextSwitchTo(mycontext);
 
 	/*
-	 * gimme_tree will add entries to root->join_rel_list, which may or may
-	 * not already contain some entries.  The newly added entries will be
-	 * recycled by the MemoryContextDelete below, so we must ensure that the
-	 * list is restored to its former state before exiting.  We can do this by
-	 * truncating the list to its original length.  NOTE this assumes that any
-	 * added entries are appended at the end!
+	 * gimme_tree will add entries to root->join_rel_list, root->agg_info_list
+	 * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not
+	 * already contain some entries.  The newly added entries will be recycled
+	 * by the MemoryContextDelete below, so we must ensure that each list of
+	 * the RelInfoList structures is restored to its former state before
+	 * exiting.  We can do this by truncating each list to its original length.
+	 * NOTE this assumes that any added entries are appended at the end!
 	 *
-	 * We also must take care not to mess up the outer join_rel_list->hash, if
-	 * there is one.  We can do this by just temporarily setting the link to
-	 * NULL.  (If we are dealing with enough join rels, which we very likely
-	 * are, a new hash table will get built and used locally.)
+	 * We also must take care not to mess up the outer hash tables of the
+	 * RelInfoList structures, if any.  We can do this by just temporarily
+	 * setting each link to NULL.  (If we are dealing with enough join rels,
+	 * which we very likely are, new hash tables will get built and used
+	 * locally.)
 	 *
 	 * join_rel_level[] shouldn't be in use, so just Assert it isn't.
 	 */
-	savelength = list_length(root->join_rel_list->items);
-	savehash = root->join_rel_list->hash;
+	savelength_join_rel = list_length(root->join_rel_list->items);
+	savehash_join_rel = root->join_rel_list->hash;
+
+	savelength_grouped_rel =
+		list_length(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items);
+	savehash_grouped_rel =
+		root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash;
+
+	savelength_grouped_info = list_length(root->agg_info_list->items);
+	savehash_grouped_info = root->agg_info_list->hash;
+
 	Assert(root->join_rel_level == NULL);
 
 	root->join_rel_list->hash = NULL;
+	root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL;
+	root->agg_info_list->hash = NULL;
 
 	/* construct the best path for the given combination of relations */
 	joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +135,22 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
 		fitness = DBL_MAX;
 
 	/*
-	 * Restore join_rel_list to its former state, and put back original
-	 * hashtable if any.
+	 * Restore each of the list in join_rel_list, agg_info_list and
+	 * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back
+	 * original hashtable if any.
 	 */
 	root->join_rel_list->items = list_truncate(root->join_rel_list->items,
-											   savelength);
-	root->join_rel_list->hash = savehash;
+											   savelength_join_rel);
+	root->join_rel_list->hash = savehash_join_rel;
+
+	root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items =
+		list_truncate(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items,
+					  savelength_grouped_rel);
+	root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = savehash_grouped_rel;
+
+	root->agg_info_list->items = list_truncate(root->agg_info_list->items,
+											   savelength_grouped_info);
+	root->agg_info_list->hash = savehash_grouped_info;
 
 	/* release all the memory acquired within gimme_tree */
 	MemoryContextSwitchTo(oldcxt);
@@ -279,6 +306,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
 				/* Find and save the cheapest paths for this joinrel */
 				set_cheapest(joinrel);
 
+				/*
+				 * Except for the topmost scan/join rel, consider generating
+				 * partial aggregation paths for the grouped relation on top of the
+				 * paths of this rel.  After that, we're done creating paths for
+				 * the grouped relation, so run set_cheapest().
+				 */
+				if (!bms_equal(joinrel->relids, root->all_query_rels))
+				{
+					RelOptInfo   *rel_grouped;
+					RelAggInfo   *agg_info;
+
+					rel_grouped = find_grouped_rel(root, joinrel->relids,
+												   &agg_info);
+					if (rel_grouped)
+					{
+						generate_grouped_paths(root, rel_grouped, joinrel,
+											   agg_info);
+						set_cheapest(rel_grouped);
+					}
+				}
+
 				/* Absorb new clump into old */
 				old_clump->joinrel = joinrel;
 				old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index ea2341d110..440a5daec7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3864,6 +3864,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 		 *
 		 * After that, we're done creating paths for the joinrel, so run
 		 * set_cheapest().
+		 *
+		 * In addition, we also run generate_grouped_paths() for the grouped
+		 * relation of each just-processed joinrel, and run set_cheapest() for
+		 * the grouped relation afterwards.
 		 */
 		foreach(lc, root->join_rel_level[lev])
 		{
@@ -3884,6 +3888,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
 			/* Find and save the cheapest paths for this rel */
 			set_cheapest(rel);
 
+			/*
+			 * Except for the topmost scan/join rel, consider generating
+			 * partial aggregation paths for the grouped relation on top of the
+			 * paths of this rel.  After that, we're done creating paths for
+			 * the grouped relation, so run set_cheapest().
+			 */
+			if (!bms_equal(rel->relids, root->all_query_rels))
+			{
+				RelOptInfo   *rel_grouped;
+				RelAggInfo   *agg_info;
+
+				rel_grouped = find_grouped_rel(root, rel->relids,
+											   &agg_info);
+				if (rel_grouped)
+				{
+					generate_grouped_paths(root, rel_grouped, rel,
+										   agg_info);
+					set_cheapest(rel_grouped);
+				}
+			}
+
 #ifdef OPTIMIZER_DEBUG
 			pprint(rel);
 #endif
@@ -4752,6 +4777,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
 		if (IS_DUMMY_REL(child_rel))
 			continue;
 
+		/*
+		 * Except for the topmost scan/join rel, consider generating partial
+		 * aggregation paths for the grouped relation on top of the paths of
+		 * this partitioned child-join.  After that, we're done creating paths
+		 * for the grouped relation, so run set_cheapest().
+		 */
+		if (!bms_equal(IS_OTHER_REL(rel) ?
+					   rel->top_parent_relids : rel->relids,
+					   root->all_query_rels))
+		{
+			RelOptInfo   *rel_grouped;
+			RelAggInfo   *agg_info;
+
+			rel_grouped = find_grouped_rel(root, child_rel->relids,
+										   &agg_info);
+			if (rel_grouped)
+			{
+				generate_grouped_paths(root, rel_grouped, child_rel,
+									   agg_info);
+				set_cheapest(rel_grouped);
+			}
+		}
+
 #ifdef OPTIMIZER_DEBUG
 		pprint(child_rel);
 #endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4750579b0a..a9ef081597 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,11 +16,13 @@
 
 #include "miscadmin.h"
 #include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
 #include "optimizer/joininfo.h"
 #include "optimizer/pathnode.h"
 #include "optimizer/paths.h"
 #include "partitioning/partbounds.h"
 #include "utils/memutils.h"
+#include "utils/selfuncs.h"
 
 
 static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
 static bool restriction_is_constant_false(List *restrictlist,
 										  RelOptInfo *joinrel,
 										  bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+								  RelOptInfo *rel2, RelOptInfo *joinrel,
+								  SpecialJoinInfo *sjinfo, List *restrictlist);
 static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
 										RelOptInfo *rel2, RelOptInfo *joinrel,
 										SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -753,6 +758,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
 		return joinrel;
 	}
 
+	/* Build a grouped join relation for 'joinrel' if possible. */
+	make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+						  restrictlist);
+
 	/* Add paths to the join relation. */
 	populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
 								restrictlist);
@@ -864,6 +873,107 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
 	return input_relids;
 }
 
+/*
+ * make_grouped_join_rel
+ *	  Build a grouped join relation out of 'joinrel' if eager aggregation is
+ *	  possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+					  RelOptInfo *rel2, RelOptInfo *joinrel,
+					  SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+	Relids			joinrelids;
+	RelOptInfo	   *rel_grouped;
+	RelAggInfo	   *agg_info = NULL;
+	RelOptInfo	   *rel1_grouped;
+	RelOptInfo	   *rel2_grouped;
+
+	/*
+	 * If there are no aggregate expressions or grouping expressions, eager
+	 * aggregation is not possible.
+	 */
+	if (root->agg_clause_list == NIL ||
+		root->group_expr_list == NIL)
+		return;
+
+	joinrelids = bms_union(rel1->relids, rel2->relids);
+	rel_grouped = find_grouped_rel(root, joinrelids, &agg_info);
+
+	/*
+	 * Construct a new RelOptInfo for the grouped join relation if there is no
+	 * existing one.
+	 */
+	if (rel_grouped == NULL)
+	{
+		/*
+		 * Prepare the information we need to create grouped paths for this
+		 * join relation.
+		 */
+		agg_info = create_rel_agg_info(root, joinrel);
+		if (agg_info == NULL)
+			return;
+
+		/* build a grouped relation out of the plain relation */
+		rel_grouped = build_grouped_rel(root, joinrel);
+		rel_grouped->reltarget = agg_info->target;
+		rel_grouped->rows = agg_info->grouped_rows;
+
+		/*
+		 * Make the grouped relation available for further joining or for
+		 * acting as the upper rel representing the result of partial
+		 * aggregation.
+		 */
+		add_grouped_rel(root, rel_grouped, agg_info);
+	}
+
+	Assert(agg_info != NULL);
+
+	/* retrieve the grouped relations for the two input rels */
+	rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+	rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+	/* we should not see dummy grouped relation */
+	Assert(rel1_grouped == NULL || !IS_DUMMY_REL(rel1_grouped));
+	Assert(rel2_grouped == NULL || !IS_DUMMY_REL(rel2_grouped));
+
+	/* Nothing to do if there's no grouped relation. */
+	if (rel1_grouped == NULL &&
+		rel2_grouped == NULL)
+		return;
+
+	/*
+	 * Join of two grouped relations is currently not supported.  In such a
+	 * case, grouping of one side would change the occurrence of the other
+	 * side's aggregate transient states on the input of the final aggregation.
+	 * This can be handled by adjusting the transient states, but it's not
+	 * worth the effort for now.
+	 */
+	if (rel1_grouped != NULL &&
+		rel2_grouped != NULL)
+		return;
+
+	/* generate partial aggregation paths for the grouped relation */
+	if (rel1_grouped != NULL)
+	{
+		set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2,
+								   sjinfo, restrictlist);
+		populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+									sjinfo, restrictlist);
+	}
+	else if (rel2_grouped != NULL)
+	{
+		set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped,
+								   sjinfo, restrictlist);
+		populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+									sjinfo, restrictlist);
+	}
+}
+
 /*
  * populate_joinrel_with_paths
  *	  Add paths to the given joinrel for given pair of joining relations. The
@@ -1653,6 +1763,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
 						 adjust_child_relids(joinrel->relids,
 											 nappinfos, appinfos)));
 
+		/* Build a grouped join relation for 'child_joinrel' if possible */
+		make_grouped_join_rel(root, child_rel1, child_rel2,
+							  child_joinrel, child_sjinfo,
+							  child_restrictlist);
+
 		/* And make paths for the child join */
 		populate_joinrel_with_paths(root, child_rel1, child_rel2,
 									child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index be4e182869..f8f2a09f1b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3946,10 +3946,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
 
 	/* Gather any partially grouped partial paths. */
 	if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
-	{
 		gather_grouping_paths(root, partially_grouped_rel);
+
+	/*
+	 * Now choose the best path(s) for partially_grouped_rel.
+	 *
+	 * Note that the non-partial paths can come either from the Gather above or
+	 * from eager aggregation.
+	 */
+	if (partially_grouped_rel && partially_grouped_rel->pathlist)
 		set_cheapest(partially_grouped_rel);
-	}
 
 	/*
 	 * Estimate number of groups.
@@ -7043,6 +7049,13 @@ create_partial_grouping_paths(PlannerInfo *root,
 	bool		can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
 	bool		can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
 
+	/*
+	 * The partially_grouped_rel could have been already created due to eager
+	 * aggregation.
+	 */
+	partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+	Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
 	/*
 	 * Consider whether we should generate partially aggregated non-partial
 	 * paths.  We can only do this if we have a non-partial path, and only if
@@ -7066,19 +7079,25 @@ create_partial_grouping_paths(PlannerInfo *root,
 	 * If we can't partially aggregate partial paths, and we can't partially
 	 * aggregate non-partial paths, then don't bother creating the new
 	 * RelOptInfo at all, unless the caller specified force_rel_creation.
+	 *
+	 * Note that the partially_grouped_rel could have been already created and
+	 * populated with appropriate paths by eager aggregation.
 	 */
 	if (cheapest_total_path == NULL &&
 		cheapest_partial_path == NULL &&
 		!force_rel_creation)
-		return NULL;
+		return partially_grouped_rel;
 
 	/*
 	 * Build a new upper relation to represent the result of partially
-	 * aggregating the rows from the input relation.
-	 */
-	partially_grouped_rel = fetch_upper_rel(root,
-											UPPERREL_PARTIAL_GROUP_AGG,
-											grouped_rel->relids);
+	 * aggregating the rows from the input relation.  The relation may already
+	 * exist due to eager aggregation, in which case we don't need to create
+	 * it.
+	 */
+	if (partially_grouped_rel == NULL)
+		partially_grouped_rel = fetch_upper_rel(root,
+												UPPERREL_PARTIAL_GROUP_AGG,
+												grouped_rel->relids);
 	partially_grouped_rel->consider_parallel =
 		grouped_rel->consider_parallel;
 	partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 51fdeace7d..7016473047 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -495,6 +495,70 @@ adjust_appendrel_attrs_mutator(Node *node,
 		return (Node *) newinfo;
 	}
 
+	/*
+	 * We have to process RelAggInfo nodes specially.
+	 */
+	if (IsA(node, RelAggInfo))
+	{
+		RelAggInfo *oldinfo = (RelAggInfo *) node;
+		RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+		/* Copy all flat-copiable fields */
+		memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+		newinfo->relids = adjust_child_relids(oldinfo->relids,
+											  context->nappinfos,
+											  context->appinfos);
+
+		newinfo->target = (PathTarget *)
+			adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+										   context);
+
+		newinfo->agg_input = (PathTarget *)
+			adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+										   context);
+
+		newinfo->group_clauses = (List *)
+			adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+										   context);
+
+		newinfo->group_exprs = (List *)
+			adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+										   context);
+
+		newinfo->agg_exprs = (List *)
+			adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_exprs,
+										   context);
+
+		return (Node *) newinfo;
+	}
+
+	/*
+	 * We have to process PathTarget nodes specially.
+	 */
+	if (IsA(node, PathTarget))
+	{
+		PathTarget *oldtarget = (PathTarget *) node;
+		PathTarget *newtarget = makeNode(PathTarget);
+
+		/* Copy all flat-copiable fields */
+		memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+		if (oldtarget->sortgrouprefs)
+		{
+			Size	nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+			newtarget->exprs = (List *)
+				adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+											   context);
+
+			newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+			memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+		}
+
+		return (Node *) newtarget;
+	}
+
 	/*
 	 * NOTE: we do not need to recurse into sublinks, because they should
 	 * already have been converted to subplans before we see them.
-- 
2.31.0



  [application/octet-stream] v1-0009-Add-README.patch (4.8K, 10-v1-0009-Add-README.patch)
  download | inline diff:
From 2037ffb3a2636203d4105c2ee0e47b9aa67041d7 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 13:41:36 +0800
Subject: [PATCH v1 9/9] Add README

---
 src/backend/optimizer/README | 88 ++++++++++++++++++++++++++++++++++++
 1 file changed, 88 insertions(+)

diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..fa5cdc135f 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,91 @@ breaking down aggregation or grouping over a partitioned relation into
 aggregation or grouping over its partitions is called partitionwise
 aggregation.  Especially when the partition keys match the GROUP BY clause,
 this can be significantly faster than the regular method.
+
+Eager aggregation
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+   Group Key: a.i
+   ->  Nested Loop
+         ->  Partial HashAggregate
+               Group Key: b.j
+               ->  Seq Scan on b
+         ->  Index Only Scan using a_pkey on a
+               Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key; besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partiall aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+        JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+   Group Key: a.i
+   ->  Nested Loop
+         ->  Partial HashAggregate
+               Group Key: b.j
+               ->  Hash Join
+                     Hash Cond: (b.j = c.i)
+                     ->  Seq Scan on b
+                     ->  Hash
+                           ->  Seq Scan on c
+         ->  Index Only Scan using a_pkey on a
+               Index Cond: (i = b.j)
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
-- 
2.31.0



  [application/octet-stream] v1-0008-Add-test-cases.patch (66.8K, 11-v1-0008-Add-test-cases.patch)
  download | inline diff:
From 09fed8131d6b2def5e5d76c7b73e86a9ae997c7a Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 13:41:22 +0800
Subject: [PATCH v1 8/9] Add test cases

---
 src/test/regress/expected/eager_aggregate.out | 1270 +++++++++++++++++
 src/test/regress/parallel_schedule            |    2 +-
 src/test/regress/sql/eager_aggregate.sql      |  205 +++
 3 files changed, 1476 insertions(+), 1 deletion(-)
 create mode 100644 src/test/regress/expected/eager_aggregate.out
 create mode 100644 src/test/regress/sql/eager_aggregate.sql

diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..2d7dec8a5d
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1270 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize HashAggregate
+   Output: t1.a, avg(t2.c)
+   Group Key: t1.a
+   ->  Hash Join
+         Output: t1.a, (PARTIAL avg(t2.c))
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg(t2.c))
+               ->  Partial HashAggregate
+                     Output: t2.b, PARTIAL avg(t2.c)
+                     Group Key: t2.b
+                     ->  Seq Scan on public.eager_agg_t2 t2
+                           Output: t2.a, t2.b, t2.c
+(15 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ a | avg 
+---+-----
+ 1 | 496
+ 2 | 497
+ 6 | 501
+ 7 | 502
+ 3 | 498
+ 4 | 499
+ 9 | 504
+ 5 | 500
+ 8 | 503
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Output: t1.a, avg(t2.c)
+   Group Key: t1.a
+   ->  Sort
+         Output: t1.a, (PARTIAL avg(t2.c))
+         Sort Key: t1.a
+         ->  Hash Join
+               Output: t1.a, (PARTIAL avg(t2.c))
+               Hash Cond: (t1.b = t2.b)
+               ->  Seq Scan on public.eager_agg_t1 t1
+                     Output: t1.a, t1.b, t1.c
+               ->  Hash
+                     Output: t2.b, (PARTIAL avg(t2.c))
+                     ->  Partial GroupAggregate
+                           Output: t2.b, PARTIAL avg(t2.c)
+                           Group Key: t2.b
+                           ->  Sort
+                                 Output: t2.c, t2.b
+                                 Sort Key: t2.b
+                                 ->  Seq Scan on public.eager_agg_t2 t2
+                                       Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ a | avg 
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+                               QUERY PLAN                               
+------------------------------------------------------------------------
+ Finalize HashAggregate
+   Output: t1.a, avg((t2.c + t3.c))
+   Group Key: t1.a
+   ->  Hash Join
+         Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+         Hash Cond: (t1.b = t2.b)
+         ->  Seq Scan on public.eager_agg_t1 t1
+               Output: t1.a, t1.b, t1.c
+         ->  Hash
+               Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+               ->  Partial HashAggregate
+                     Output: t2.b, PARTIAL avg((t2.c + t3.c))
+                     Group Key: t2.b
+                     ->  Hash Join
+                           Output: t2.c, t3.c, t2.b
+                           Hash Cond: (t3.a = t2.a)
+                           ->  Seq Scan on public.eager_agg_t3 t3
+                                 Output: t3.a, t3.b, t3.c
+                           ->  Hash
+                                 Output: t2.c, t2.b, t2.a
+                                 ->  Seq Scan on public.eager_agg_t2 t2
+                                       Output: t2.c, t2.b, t2.a
+(22 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ a | avg 
+---+-----
+ 1 | 497
+ 2 | 499
+ 6 | 507
+ 7 | 509
+ 3 | 501
+ 4 | 503
+ 9 | 513
+ 5 | 505
+ 8 | 511
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+                                     QUERY PLAN                                     
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Output: t1.a, avg((t2.c + t3.c))
+   Group Key: t1.a
+   ->  Sort
+         Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+         Sort Key: t1.a
+         ->  Hash Join
+               Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+               Hash Cond: (t1.b = t2.b)
+               ->  Seq Scan on public.eager_agg_t1 t1
+                     Output: t1.a, t1.b, t1.c
+               ->  Hash
+                     Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+                     ->  Partial GroupAggregate
+                           Output: t2.b, PARTIAL avg((t2.c + t3.c))
+                           Group Key: t2.b
+                           ->  Sort
+                                 Output: t2.c, t3.c, t2.b
+                                 Sort Key: t2.b
+                                 ->  Hash Join
+                                       Output: t2.c, t3.c, t2.b
+                                       Hash Cond: (t3.a = t2.a)
+                                       ->  Seq Scan on public.eager_agg_t3 t3
+                                             Output: t3.a, t3.b, t3.c
+                                       ->  Hash
+                                             Output: t2.c, t2.b, t2.a
+                                             ->  Seq Scan on public.eager_agg_t2 t2
+                                                   Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ a | avg 
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Finalize GroupAggregate
+   Output: t3.a, avg(t3.c)
+   Group Key: t3.a
+   ->  Sort
+         Output: t3.a, (PARTIAL avg(t3.c))
+         Sort Key: t3.a
+         ->  Hash Left Join
+               Output: t3.a, (PARTIAL avg(t3.c))
+               Hash Cond: (t3.b = t1.b)
+               ->  Partial HashAggregate
+                     Output: t3.a, t3.b, PARTIAL avg(t3.c)
+                     Group Key: t3.a, t3.b
+                     ->  Seq Scan on public.eager_agg_t3 t3
+                           Output: t3.a, t3.b, t3.c
+               ->  Hash
+                     Output: t1.b
+                     ->  Seq Scan on public.eager_agg_t1 t1
+                           Output: t1.b
+(18 rows)
+
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ a | avg 
+---+-----
+ 0 | 505
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+                      QUERY PLAN                      
+------------------------------------------------------
+ HashAggregate
+   Output: t3.a, avg(t3.c)
+   Group Key: t3.a
+   ->  Hash Right Join
+         Output: t3.a, t3.c
+         Hash Cond: (t3.b = t1.b)
+         ->  Seq Scan on public.eager_agg_t3 t3
+               Output: t3.a, t3.b, t3.c
+         ->  Hash
+               Output: t1.b
+               ->  Seq Scan on public.eager_agg_t1 t1
+                     Output: t1.b
+(12 rows)
+
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ a | avg 
+---+-----
+ 8 | 503
+   |    
+ 9 | 504
+ 7 | 502
+ 1 | 496
+ 5 | 500
+ 4 | 499
+ 2 | 497
+ 6 | 501
+ 3 | 498
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Output: t1.a, avg(t2.c)
+   Group Key: t1.a
+   ->  Sort
+         Output: t1.a, (PARTIAL avg(t2.c))
+         Sort Key: t1.a
+         ->  Gather
+               Output: t1.a, (PARTIAL avg(t2.c))
+               Workers Planned: 2
+               ->  Parallel Hash Join
+                     Output: t1.a, (PARTIAL avg(t2.c))
+                     Hash Cond: (t1.b = t2.b)
+                     ->  Parallel Seq Scan on public.eager_agg_t1 t1
+                           Output: t1.a, t1.b, t1.c
+                     ->  Parallel Hash
+                           Output: t2.b, (PARTIAL avg(t2.c))
+                           ->  Partial HashAggregate
+                                 Output: t2.b, PARTIAL avg(t2.c)
+                                 Group Key: t2.b
+                                 ->  Parallel Seq Scan on public.eager_agg_t2 t2
+                                       Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ a | avg 
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
+ Append
+   ->  Finalize HashAggregate
+         Output: t1.x, sum(t1.y), count(*)
+         Group Key: t1.x
+         ->  Hash Join
+               Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+               Hash Cond: (t2.y = t1.x)
+               ->  Seq Scan on public.eager_agg_tab2_p1 t2
+                     Output: t2.y
+               ->  Hash
+                     Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+                           Group Key: t1.x
+                           ->  Seq Scan on public.eager_agg_tab1_p1 t1
+                                 Output: t1.x, t1.y
+   ->  Finalize HashAggregate
+         Output: t1_1.x, sum(t1_1.y), count(*)
+         Group Key: t1_1.x
+         ->  Hash Join
+               Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+               Hash Cond: (t2_1.y = t1_1.x)
+               ->  Seq Scan on public.eager_agg_tab2_p2 t2_1
+                     Output: t2_1.y
+               ->  Hash
+                     Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+                           Group Key: t1_1.x
+                           ->  Seq Scan on public.eager_agg_tab1_p2 t1_1
+                                 Output: t1_1.x, t1_1.y
+   ->  Finalize HashAggregate
+         Output: t1_2.x, sum(t1_2.y), count(*)
+         Group Key: t1_2.x
+         ->  Hash Join
+               Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+               Hash Cond: (t2_2.y = t1_2.x)
+               ->  Seq Scan on public.eager_agg_tab2_p3 t2_2
+                     Output: t2_2.y
+               ->  Hash
+                     Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+                           Group Key: t1_2.x
+                           ->  Seq Scan on public.eager_agg_tab1_p3 t1_2
+                                 Output: t1_2.x, t1_2.y
+(46 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+ x  | sum  | count 
+----+------+-------
+  6 | 1100 |   100
+  0 |  500 |   100
+ 12 |  700 |   100
+ 18 | 1300 |   100
+ 24 |  900 |   100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
+ Append
+   ->  Finalize HashAggregate
+         Output: t2.y, sum(t1.y), count(*)
+         Group Key: t2.y
+         ->  Hash Join
+               Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+               Hash Cond: (t2.y = t1.x)
+               ->  Seq Scan on public.eager_agg_tab2_p1 t2
+                     Output: t2.y
+               ->  Hash
+                     Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+                           Group Key: t1.x
+                           ->  Seq Scan on public.eager_agg_tab1_p1 t1
+                                 Output: t1.y, t1.x
+   ->  Finalize HashAggregate
+         Output: t2_1.y, sum(t1_1.y), count(*)
+         Group Key: t2_1.y
+         ->  Hash Join
+               Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+               Hash Cond: (t2_1.y = t1_1.x)
+               ->  Seq Scan on public.eager_agg_tab2_p2 t2_1
+                     Output: t2_1.y
+               ->  Hash
+                     Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+                           Group Key: t1_1.x
+                           ->  Seq Scan on public.eager_agg_tab1_p2 t1_1
+                                 Output: t1_1.y, t1_1.x
+   ->  Finalize HashAggregate
+         Output: t2_2.y, sum(t1_2.y), count(*)
+         Group Key: t2_2.y
+         ->  Hash Join
+               Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+               Hash Cond: (t2_2.y = t1_2.x)
+               ->  Seq Scan on public.eager_agg_tab2_p3 t2_2
+                     Output: t2_2.y
+               ->  Hash
+                     Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+                           Group Key: t1_2.x
+                           ->  Seq Scan on public.eager_agg_tab1_p3 t1_2
+                                 Output: t1_2.y, t1_2.x
+(46 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+ y  | sum  | count 
+----+------+-------
+  6 | 1100 |   100
+  0 |  500 |   100
+ 18 | 1300 |   100
+ 12 |  700 |   100
+ 24 |  900 |   100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+                                              QUERY PLAN                                              
+------------------------------------------------------------------------------------------------------
+ Finalize HashAggregate
+   Output: t2.x, sum(t1.x), count(*)
+   Group Key: t2.x
+   Filter: (avg(t1.x) > '10'::numeric)
+   ->  Append
+         ->  Hash Join
+               Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+               Hash Cond: (t2_1.y = t1_1.x)
+               ->  Seq Scan on public.eager_agg_tab2_p1 t2_1
+                     Output: t2_1.x, t2_1.y
+               ->  Hash
+                     Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+                     ->  Partial HashAggregate
+                           Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+                           Group Key: t1_1.x
+                           ->  Seq Scan on public.eager_agg_tab1_p1 t1_1
+                                 Output: t1_1.x
+         ->  Hash Join
+               Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+               Hash Cond: (t2_2.y = t1_2.x)
+               ->  Seq Scan on public.eager_agg_tab2_p2 t2_2
+                     Output: t2_2.x, t2_2.y
+               ->  Hash
+                     Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+                     ->  Partial HashAggregate
+                           Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+                           Group Key: t1_2.x
+                           ->  Seq Scan on public.eager_agg_tab1_p2 t1_2
+                                 Output: t1_2.x
+         ->  Hash Join
+               Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+               Hash Cond: (t2_3.y = t1_3.x)
+               ->  Seq Scan on public.eager_agg_tab2_p3 t2_3
+                     Output: t2_3.x, t2_3.y
+               ->  Hash
+                     Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+                     ->  Partial HashAggregate
+                           Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+                           Group Key: t1_3.x
+                           ->  Seq Scan on public.eager_agg_tab1_p3 t1_3
+                                 Output: t1_3.x
+(41 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+ x  | sum  | count 
+----+------+-------
+  4 | 1200 |    50
+ 14 | 1200 |    50
+ 18 |  900 |    50
+  2 |  600 |    50
+ 12 |  600 |    50
+  8 |  900 |    50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+                                     QUERY PLAN                                      
+-------------------------------------------------------------------------------------
+ Append
+   ->  Finalize HashAggregate
+         Output: t1.x, sum((t2.y + t3.y))
+         Group Key: t1.x
+         ->  Hash Join
+               Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+               Hash Cond: (t1.x = t2.x)
+               ->  Seq Scan on public.eager_agg_tab1_p1 t1
+                     Output: t1.x
+               ->  Hash
+                     Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+                     ->  Partial HashAggregate
+                           Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+                           Group Key: t2.x
+                           ->  Hash Join
+                                 Output: t2.y, t2.x, t3.y, t3.x
+                                 Hash Cond: (t2.x = t3.x)
+                                 ->  Seq Scan on public.eager_agg_tab1_p1 t2
+                                       Output: t2.y, t2.x
+                                 ->  Hash
+                                       Output: t3.y, t3.x
+                                       ->  Seq Scan on public.eager_agg_tab1_p1 t3
+                                             Output: t3.y, t3.x
+   ->  Finalize HashAggregate
+         Output: t1_1.x, sum((t2_1.y + t3_1.y))
+         Group Key: t1_1.x
+         ->  Hash Join
+               Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+               Hash Cond: (t1_1.x = t2_1.x)
+               ->  Seq Scan on public.eager_agg_tab1_p2 t1_1
+                     Output: t1_1.x
+               ->  Hash
+                     Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+                     ->  Partial HashAggregate
+                           Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+                           Group Key: t2_1.x
+                           ->  Hash Join
+                                 Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+                                 Hash Cond: (t2_1.x = t3_1.x)
+                                 ->  Seq Scan on public.eager_agg_tab1_p2 t2_1
+                                       Output: t2_1.y, t2_1.x
+                                 ->  Hash
+                                       Output: t3_1.y, t3_1.x
+                                       ->  Seq Scan on public.eager_agg_tab1_p2 t3_1
+                                             Output: t3_1.y, t3_1.x
+   ->  Finalize HashAggregate
+         Output: t1_2.x, sum((t2_2.y + t3_2.y))
+         Group Key: t1_2.x
+         ->  Hash Join
+               Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+               Hash Cond: (t1_2.x = t2_2.x)
+               ->  Seq Scan on public.eager_agg_tab1_p3 t1_2
+                     Output: t1_2.x
+               ->  Hash
+                     Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+                     ->  Partial HashAggregate
+                           Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+                           Group Key: t2_2.x
+                           ->  Hash Join
+                                 Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+                                 Hash Cond: (t2_2.x = t3_2.x)
+                                 ->  Seq Scan on public.eager_agg_tab1_p3 t2_2
+                                       Output: t2_2.y, t2_2.x
+                                 ->  Hash
+                                       Output: t3_2.y, t3_2.x
+                                       ->  Seq Scan on public.eager_agg_tab1_p3 t3_2
+                                             Output: t3_2.y, t3_2.x
+(67 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+ x  |  sum  
+----+-------
+  4 | 18000
+  2 | 14000
+  8 | 26000
+  6 | 22000
+  0 | 10000
+ 16 | 22000
+ 10 | 10000
+ 14 | 18000
+ 12 | 14000
+ 18 | 26000
+ 26 | 22000
+ 28 | 26000
+ 22 | 14000
+ 20 | 10000
+ 24 | 18000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+                                        QUERY PLAN                                         
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+   Output: t3.y, sum((t2.y + t3.y))
+   Group Key: t3.y
+   ->  Sort
+         Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+         Sort Key: t3.y
+         ->  Append
+               ->  Hash Join
+                     Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+                     Hash Cond: (t2_1.x = t1_1.x)
+                     ->  Partial GroupAggregate
+                           Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+                           Group Key: t3_1.y, t2_1.x, t3_1.x
+                           ->  Sort
+                                 Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+                                 Sort Key: t3_1.y, t2_1.x
+                                 ->  Hash Join
+                                       Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+                                       Hash Cond: (t2_1.x = t3_1.x)
+                                       ->  Seq Scan on public.eager_agg_tab1_p1 t2_1
+                                             Output: t2_1.y, t2_1.x
+                                       ->  Hash
+                                             Output: t3_1.y, t3_1.x
+                                             ->  Seq Scan on public.eager_agg_tab1_p1 t3_1
+                                                   Output: t3_1.y, t3_1.x
+                     ->  Hash
+                           Output: t1_1.x
+                           ->  Seq Scan on public.eager_agg_tab1_p1 t1_1
+                                 Output: t1_1.x
+               ->  Hash Join
+                     Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+                     Hash Cond: (t2_2.x = t1_2.x)
+                     ->  Partial GroupAggregate
+                           Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+                           Group Key: t3_2.y, t2_2.x, t3_2.x
+                           ->  Sort
+                                 Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+                                 Sort Key: t3_2.y, t2_2.x
+                                 ->  Hash Join
+                                       Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+                                       Hash Cond: (t2_2.x = t3_2.x)
+                                       ->  Seq Scan on public.eager_agg_tab1_p2 t2_2
+                                             Output: t2_2.y, t2_2.x
+                                       ->  Hash
+                                             Output: t3_2.y, t3_2.x
+                                             ->  Seq Scan on public.eager_agg_tab1_p2 t3_2
+                                                   Output: t3_2.y, t3_2.x
+                     ->  Hash
+                           Output: t1_2.x
+                           ->  Seq Scan on public.eager_agg_tab1_p2 t1_2
+                                 Output: t1_2.x
+               ->  Hash Join
+                     Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+                     Hash Cond: (t2_3.x = t1_3.x)
+                     ->  Partial GroupAggregate
+                           Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+                           Group Key: t3_3.y, t2_3.x, t3_3.x
+                           ->  Sort
+                                 Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+                                 Sort Key: t3_3.y, t2_3.x
+                                 ->  Hash Join
+                                       Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+                                       Hash Cond: (t2_3.x = t3_3.x)
+                                       ->  Seq Scan on public.eager_agg_tab1_p3 t2_3
+                                             Output: t2_3.y, t2_3.x
+                                       ->  Hash
+                                             Output: t3_3.y, t3_3.x
+                                             ->  Seq Scan on public.eager_agg_tab1_p3 t3_3
+                                                   Output: t3_3.y, t3_3.x
+                     ->  Hash
+                           Output: t1_3.x
+                           ->  Seq Scan on public.eager_agg_tab1_p3 t1_3
+                                 Output: t1_3.x
+(73 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+ y  |  sum  
+----+-------
+  0 |  7500
+  2 | 13500
+  4 | 19500
+  6 | 25500
+  8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
+ Append
+   ->  Finalize HashAggregate
+         Output: t1.x, sum(t2.y), count(*)
+         Group Key: t1.x
+         ->  Hash Join
+               Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+               Hash Cond: (t1.x = t2.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p1 t1
+                     Output: t1.x
+               ->  Hash
+                     Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+                           Group Key: t2.x
+                           ->  Seq Scan on public.eager_agg_tab_ml_p1 t2
+                                 Output: t2.y, t2.x
+   ->  Finalize HashAggregate
+         Output: t1_1.x, sum(t2_1.y), count(*)
+         Group Key: t1_1.x
+         ->  Hash Join
+               Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+               Hash Cond: (t1_1.x = t2_1.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+                     Output: t1_1.x
+               ->  Hash
+                     Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+                           Group Key: t2_1.x
+                           ->  Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+                                 Output: t2_1.y, t2_1.x
+   ->  Finalize HashAggregate
+         Output: t1_2.x, sum(t2_2.y), count(*)
+         Group Key: t1_2.x
+         ->  Hash Join
+               Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+               Hash Cond: (t1_2.x = t2_2.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+                     Output: t1_2.x
+               ->  Hash
+                     Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+                           Group Key: t2_2.x
+                           ->  Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+                                 Output: t2_2.y, t2_2.x
+   ->  Finalize HashAggregate
+         Output: t1_3.x, sum(t2_3.y), count(*)
+         Group Key: t1_3.x
+         ->  Hash Join
+               Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+               Hash Cond: (t1_3.x = t2_3.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+                     Output: t1_3.x
+               ->  Hash
+                     Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+                           Group Key: t2_3.x
+                           ->  Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+                                 Output: t2_3.y, t2_3.x
+   ->  Finalize HashAggregate
+         Output: t1_4.x, sum(t2_4.y), count(*)
+         Group Key: t1_4.x
+         ->  Hash Join
+               Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+               Hash Cond: (t1_4.x = t2_4.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+                     Output: t1_4.x
+               ->  Hash
+                     Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+                           Group Key: t2_4.x
+                           ->  Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+                                 Output: t2_4.y, t2_4.x
+(76 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+ x  |  sum  | count 
+----+-------+-------
+  8 |  9248 |  1156
+  9 | 10404 |  1156
+  7 |  8092 |  1156
+  1 |  1156 |  1156
+  5 |  5780 |  1156
+  4 |  4624 |  1156
+  2 |  2312 |  1156
+  0 |     0 |  1089
+  6 |  6936 |  1156
+  3 |  3468 |  1156
+ 11 | 11979 |  1089
+ 13 | 14157 |  1089
+ 10 | 11560 |  1156
+ 14 | 15246 |  1089
+ 12 | 13068 |  1089
+ 17 | 18513 |  1089
+ 18 | 19602 |  1089
+ 16 | 17424 |  1089
+ 15 | 16335 |  1089
+ 19 | 20691 |  1089
+ 24 | 26136 |  1089
+ 21 | 22869 |  1089
+ 23 | 25047 |  1089
+ 22 | 23958 |  1089
+ 20 | 21780 |  1089
+ 26 | 28314 |  1089
+ 27 | 29403 |  1089
+ 25 | 27225 |  1089
+ 29 | 31581 |  1089
+ 28 | 30492 |  1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+                                   QUERY PLAN                                    
+---------------------------------------------------------------------------------
+ Finalize HashAggregate
+   Output: t1.y, sum(t2.y), count(*)
+   Group Key: t1.y
+   ->  Append
+         ->  Hash Join
+               Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+               Hash Cond: (t1_1.x = t2_1.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+                     Output: t1_1.y, t1_1.x
+               ->  Hash
+                     Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+                           Group Key: t2_1.x
+                           ->  Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+                                 Output: t2_1.y, t2_1.x
+         ->  Hash Join
+               Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+               Hash Cond: (t1_2.x = t2_2.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+                     Output: t1_2.y, t1_2.x
+               ->  Hash
+                     Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+                           Group Key: t2_2.x
+                           ->  Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+                                 Output: t2_2.y, t2_2.x
+         ->  Hash Join
+               Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+               Hash Cond: (t1_3.x = t2_3.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+                     Output: t1_3.y, t1_3.x
+               ->  Hash
+                     Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+                           Group Key: t2_3.x
+                           ->  Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+                                 Output: t2_3.y, t2_3.x
+         ->  Hash Join
+               Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+               Hash Cond: (t1_4.x = t2_4.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+                     Output: t1_4.y, t1_4.x
+               ->  Hash
+                     Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+                           Group Key: t2_4.x
+                           ->  Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+                                 Output: t2_4.y, t2_4.x
+         ->  Hash Join
+               Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+               Hash Cond: (t1_5.x = t2_5.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+                     Output: t1_5.y, t1_5.x
+               ->  Hash
+                     Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+                           Group Key: t2_5.x
+                           ->  Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+                                 Output: t2_5.y, t2_5.x
+(64 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+ y  |  sum  | count 
+----+-------+-------
+ 29 | 31581 |  1089
+  4 |  4624 |  1156
+  0 |     0 |  1089
+ 10 | 11560 |  1156
+  9 | 10404 |  1156
+  7 |  8092 |  1156
+ 15 | 16335 |  1089
+  6 |  6936 |  1156
+ 26 | 28314 |  1089
+ 12 | 13068 |  1089
+ 24 | 26136 |  1089
+ 19 | 20691 |  1089
+ 25 | 27225 |  1089
+ 21 | 22869 |  1089
+ 14 | 15246 |  1089
+  3 |  3468 |  1156
+ 17 | 18513 |  1089
+ 28 | 30492 |  1089
+ 22 | 23958 |  1089
+ 20 | 21780 |  1089
+ 13 | 14157 |  1089
+  1 |  1156 |  1156
+  5 |  5780 |  1156
+ 18 | 19602 |  1089
+  2 |  2312 |  1156
+ 16 | 17424 |  1089
+ 27 | 29403 |  1089
+ 23 | 25047 |  1089
+ 11 | 11979 |  1089
+  8 |  9248 |  1156
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+                                             QUERY PLAN                                             
+----------------------------------------------------------------------------------------------------
+ Append
+   ->  Finalize HashAggregate
+         Output: t1.x, sum((t2.y + t3.y)), count(*)
+         Group Key: t1.x
+         ->  Hash Join
+               Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+               Hash Cond: (t1.x = t2.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p1 t1
+                     Output: t1.x
+               ->  Hash
+                     Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+                           Group Key: t2.x
+                           ->  Hash Join
+                                 Output: t2.y, t2.x, t3.y, t3.x
+                                 Hash Cond: (t2.x = t3.x)
+                                 ->  Seq Scan on public.eager_agg_tab_ml_p1 t2
+                                       Output: t2.y, t2.x
+                                 ->  Hash
+                                       Output: t3.y, t3.x
+                                       ->  Seq Scan on public.eager_agg_tab_ml_p1 t3
+                                             Output: t3.y, t3.x
+   ->  Finalize HashAggregate
+         Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+         Group Key: t1_1.x
+         ->  Hash Join
+               Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+               Hash Cond: (t1_1.x = t2_1.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+                     Output: t1_1.x
+               ->  Hash
+                     Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+                           Group Key: t2_1.x
+                           ->  Hash Join
+                                 Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+                                 Hash Cond: (t2_1.x = t3_1.x)
+                                 ->  Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+                                       Output: t2_1.y, t2_1.x
+                                 ->  Hash
+                                       Output: t3_1.y, t3_1.x
+                                       ->  Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+                                             Output: t3_1.y, t3_1.x
+   ->  Finalize HashAggregate
+         Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+         Group Key: t1_2.x
+         ->  Hash Join
+               Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+               Hash Cond: (t1_2.x = t2_2.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+                     Output: t1_2.x
+               ->  Hash
+                     Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+                           Group Key: t2_2.x
+                           ->  Hash Join
+                                 Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+                                 Hash Cond: (t2_2.x = t3_2.x)
+                                 ->  Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+                                       Output: t2_2.y, t2_2.x
+                                 ->  Hash
+                                       Output: t3_2.y, t3_2.x
+                                       ->  Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+                                             Output: t3_2.y, t3_2.x
+   ->  Finalize HashAggregate
+         Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+         Group Key: t1_3.x
+         ->  Hash Join
+               Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+               Hash Cond: (t1_3.x = t2_3.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+                     Output: t1_3.x
+               ->  Hash
+                     Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+                           Group Key: t2_3.x
+                           ->  Hash Join
+                                 Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+                                 Hash Cond: (t2_3.x = t3_3.x)
+                                 ->  Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+                                       Output: t2_3.y, t2_3.x
+                                 ->  Hash
+                                       Output: t3_3.y, t3_3.x
+                                       ->  Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+                                             Output: t3_3.y, t3_3.x
+   ->  Finalize HashAggregate
+         Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+         Group Key: t1_4.x
+         ->  Hash Join
+               Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+               Hash Cond: (t1_4.x = t2_4.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+                     Output: t1_4.x
+               ->  Hash
+                     Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+                           Group Key: t2_4.x
+                           ->  Hash Join
+                                 Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+                                 Hash Cond: (t2_4.x = t3_4.x)
+                                 ->  Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+                                       Output: t2_4.y, t2_4.x
+                                 ->  Hash
+                                       Output: t3_4.y, t3_4.x
+                                       ->  Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+                                             Output: t3_4.y, t3_4.x
+(111 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+ x  |   sum   | count 
+----+---------+-------
+  8 |  628864 | 39304
+  9 |  707472 | 39304
+  7 |  550256 | 39304
+  1 |   78608 | 39304
+  5 |  393040 | 39304
+  4 |  314432 | 39304
+  2 |  157216 | 39304
+  0 |       0 | 35937
+  6 |  471648 | 39304
+  3 |  235824 | 39304
+ 11 |  790614 | 35937
+ 13 |  934362 | 35937
+ 10 |  786080 | 39304
+ 14 | 1006236 | 35937
+ 12 |  862488 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 16 | 1149984 | 35937
+ 15 | 1078110 | 35937
+ 19 | 1365606 | 35937
+ 24 | 1724976 | 35937
+ 21 | 1509354 | 35937
+ 23 | 1653102 | 35937
+ 22 | 1581228 | 35937
+ 20 | 1437480 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 25 | 1796850 | 35937
+ 29 | 2084346 | 35937
+ 28 | 2012472 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+                                                 QUERY PLAN                                                 
+------------------------------------------------------------------------------------------------------------
+ Finalize HashAggregate
+   Output: t3.y, sum((t2.y + t3.y)), count(*)
+   Group Key: t3.y
+   ->  Append
+         ->  Hash Join
+               Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+               Hash Cond: (t1_1.x = t2_1.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+                     Output: t1_1.x
+               ->  Hash
+                     Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+                           Group Key: t3_1.y, t2_1.x, t3_1.x
+                           ->  Hash Join
+                                 Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+                                 Hash Cond: (t2_1.x = t3_1.x)
+                                 ->  Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+                                       Output: t2_1.y, t2_1.x
+                                 ->  Hash
+                                       Output: t3_1.y, t3_1.x
+                                       ->  Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+                                             Output: t3_1.y, t3_1.x
+         ->  Hash Join
+               Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+               Hash Cond: (t1_2.x = t2_2.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+                     Output: t1_2.x
+               ->  Hash
+                     Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+                           Group Key: t3_2.y, t2_2.x, t3_2.x
+                           ->  Hash Join
+                                 Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+                                 Hash Cond: (t2_2.x = t3_2.x)
+                                 ->  Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+                                       Output: t2_2.y, t2_2.x
+                                 ->  Hash
+                                       Output: t3_2.y, t3_2.x
+                                       ->  Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+                                             Output: t3_2.y, t3_2.x
+         ->  Hash Join
+               Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+               Hash Cond: (t1_3.x = t2_3.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+                     Output: t1_3.x
+               ->  Hash
+                     Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+                           Group Key: t3_3.y, t2_3.x, t3_3.x
+                           ->  Hash Join
+                                 Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+                                 Hash Cond: (t2_3.x = t3_3.x)
+                                 ->  Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+                                       Output: t2_3.y, t2_3.x
+                                 ->  Hash
+                                       Output: t3_3.y, t3_3.x
+                                       ->  Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+                                             Output: t3_3.y, t3_3.x
+         ->  Hash Join
+               Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+               Hash Cond: (t1_4.x = t2_4.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+                     Output: t1_4.x
+               ->  Hash
+                     Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+                           Group Key: t3_4.y, t2_4.x, t3_4.x
+                           ->  Hash Join
+                                 Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x
+                                 Hash Cond: (t2_4.x = t3_4.x)
+                                 ->  Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+                                       Output: t2_4.y, t2_4.x
+                                 ->  Hash
+                                       Output: t3_4.y, t3_4.x
+                                       ->  Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+                                             Output: t3_4.y, t3_4.x
+         ->  Hash Join
+               Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+               Hash Cond: (t1_5.x = t2_5.x)
+               ->  Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+                     Output: t1_5.x
+               ->  Hash
+                     Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+                           Group Key: t3_5.y, t2_5.x, t3_5.x
+                           ->  Hash Join
+                                 Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x
+                                 Hash Cond: (t2_5.x = t3_5.x)
+                                 ->  Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+                                       Output: t2_5.y, t2_5.x
+                                 ->  Hash
+                                       Output: t3_5.y, t3_5.x
+                                       ->  Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+                                             Output: t3_5.y, t3_5.x
+(99 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+ y  |   sum   | count 
+----+---------+-------
+ 29 | 2084346 | 35937
+  4 |  314432 | 39304
+  0 |       0 | 35937
+ 10 |  786080 | 39304
+  9 |  707472 | 39304
+  7 |  550256 | 39304
+ 15 | 1078110 | 35937
+  6 |  471648 | 39304
+ 26 | 1868724 | 35937
+ 12 |  862488 | 35937
+ 24 | 1724976 | 35937
+ 19 | 1365606 | 35937
+ 25 | 1796850 | 35937
+ 21 | 1509354 | 35937
+ 14 | 1006236 | 35937
+  3 |  235824 | 39304
+ 17 | 1221858 | 35937
+ 28 | 2012472 | 35937
+ 22 | 1581228 | 35937
+ 20 | 1437480 | 35937
+ 13 |  934362 | 35937
+  1 |   78608 | 39304
+  5 |  393040 | 39304
+ 18 | 1293732 | 35937
+  2 |  157216 | 39304
+ 16 | 1149984 | 35937
+ 27 | 1940598 | 35937
+ 23 | 1653102 | 35937
+ 11 |  790614 | 35937
+  8 |  628864 | 39304
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1d8a414eea..250a9dba21 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
 # The stats test resets stats, so nothing else needing stats access can be in
 # this group.
 # ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
 
 # event_trigger depends on create_am and cannot run concurrently with
 # any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..aba2c41557
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,205 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
-- 
2.31.0



^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Eager aggregation, take 3
@ 2026-05-29 15:55  Radim Marek <[email protected]>
  parent: Richard Guo <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Radim Marek @ 2026-05-29 15:55 UTC (permalink / raw)
  To: Richard Guo <[email protected]>; +Cc: pgsql-hackers

Hey Richard,

I might be out of my depth here - but while testing RegreSQL as
correctness/performance harness on PostgreSQL it picked up a problem with
the wrong-results case during eager aggregation.

It reproduces on current HEAD
(commit 2670cc298f42cd7b1c426bf7ccfb0652d8e0b347 now)
with enable_eager_aggregate enabled.

My testing environment
  - Linux aarch64, gcc 12 (Debian)
  - macOS arm64, Apple clang 21
    (PostgreSQL 19devel on aarch64-apple-darwin25.5.0)

== How to reproduce

  CREATE TEMP TABLE c(id int, country text);
  CREATE TEMP TABLE o(customer_id int);
  INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE');
  INSERT INTO o VALUES (1),(3);   -- only customers 1 and 3 have a row in o

  SELECT c.country, count(*) AS n
  FROM c
  WHERE NOT EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
  GROUP BY c.country
  ORDER BY c.country;

Expected results (everywhere except master)

 country | n
---------+---
 DE      | 2
 US      | 1
(2 rows)

The actual result with enable_eager_aggregate = on (default)

 country | n
---------+---
 DE      | 0
 US      | 0
(2 rows)

With SET enable_eager_aggregate = off, the result is correct (DE=2, US=1),
as it is on PG18.

Query Plan

                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=108.19..108.69 rows=200 width=40) (actual time=0.195..0.197
rows=2.00 loops=1)
   Sort Key: c.country
   Sort Method: quicksort  Memory: 25kB
   Buffers: local hit=2
   ->  Finalize HashAggregate  (cost=98.55..100.55 rows=200 width=40)
(actual time=0.183..0.186 rows=2.00 loops=1)
         Group Key: c.country
         Batches: 1  Memory Usage: 32kB
         Buffers: local hit=2
         ->  Hash Anti Join  (cost=52.75..95.37 rows=635 width=40) (actual
time=0.177..0.179 rows=3.00 loops=1)
               Hash Cond: (c.id = o.customer_id)
               Buffers: local hit=2
               ->  Seq Scan on c  (cost=0.00..22.70 rows=1270 width=36)
(actual time=0.024..0.025 rows=5.00 loops=1)
                     Buffers: local hit=1
               ->  Hash  (cost=50.25..50.25 rows=200 width=12) (actual
time=0.145..0.146 rows=2.00 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     Buffers: local hit=1
                     ->  Partial HashAggregate  (cost=48.25..50.25 rows=200
width=12) (actual time=0.122..0.123 rows=2.00 loops=1)
                           Group Key: o.customer_id
                           Batches: 1  Memory Usage: 32kB
                           Buffers: local hit=1
                           ->  Seq Scan on o  (cost=0.00..35.50 rows=2550
width=4) (actual time=0.002..0.003 rows=2.00 loops=1)
                                 Buffers: local hit=1
 Planning Time: 0.294 ms
 Execution Time: 0.255 ms
(24 rows)

If this is already known or in progress, apologies for the noise.

---

Radim

On Fri, 29 May 2026 at 17:25, Richard Guo <[email protected]> wrote:

> Hi All,
>
> Eager aggregation is a query optimization technique that partially
> pushes a group-by past a join, and finalizes it once all the relations
> are joined.  Eager aggregation reduces the number of input rows to the
> join and thus may result in a better overall plan.  This technique is
> thoroughly described in the 'Eager Aggregation and Lazy Aggregation'
> paper [1].
>
> Back in 2017, a patch set has been proposed by Antonin Houska to
> implement eager aggregation in thread [2].  However, it was at last
> withdrawn after entering the pattern of "please rebase thx" followed by
> rebasing and getting no feedback until "please rebase again thx".  A
> second attempt in 2022 unfortunately fell into the same pattern about
> one year ago and was eventually closed again [3].
>
> That patch set has included most of the necessary concepts to implement
> eager aggregation.  However, as far as I can see, it has several weak
> points that we need to address.  It introduces invasive changes to some
> core planner functions, such as make_join_rel().  And with such changes
> join_is_legal() would be performed three times for the same proposed
> join, which is not great.  Another weak point is that the complexity of
> join searching dramatically increases with the growing number of
> relations to be joined.  This occurs because when we generate partially
> aggregated paths, each path of the input relation is considered as an
> input path for the grouped paths.  As a result, the number of grouped
> paths we generate increases exponentially, leading to a significant
> explosion in computational complexity.  Other weak points include the
> lack of support for outer joins and partitionwise joins.  And during my
> review of the code, I came across several bugs (planning error or crash)
> that need to be addressed.
>
> I'd like to give it another take to implement eager aggregation, while
> borrowing lots of concepts and many chunks of codes from the previous
> patch set.  Please see attached.  I have chosen to use the term 'Eager
> Aggregation' from the paper [1] instead of 'Aggregation push-down', to
> differentiate the aggregation push-down technique in FDW.
>
> The patch has been split into small pieces to make the review easier.
>
> 0001 introduces the RelInfoList structure, which encapsulates both a
> list and a hash table, so that we can leverage the hash table for faster
> lookups not only for join relations but also for upper relations.  With
> eager aggregation, it is possible that we generate so many upper rels of
> type UPPERREL_PARTIAL_GROUP_AGG that a hash table can help a lot with
> lookups.
>
> 0002 introduces the RelAggInfo structure to store information needed to
> create grouped paths for base and join rels.  It also revises the
> RelInfoList related structures and functions so that they can be used
> with RelAggInfos.
>
> 0003 checks if eager aggregation is applicable, and if so, collects
> suitable aggregate expressions and grouping expressions in the query,
> and records them in root->agg_clause_list and root->group_expr_list
> respectively.
>
> 0004 implements the functions that check if eager aggregation is
> applicable for a given relation, and if so, create RelAggInfo structure
> for the relation, using the infos about aggregate expressions and
> grouping expressions we collected earlier.  In this patch, when we check
> if a target expression can act as grouping expression, we need to check
> if this expression can be known equal to other expressions due to ECs
> that can act as grouping expressions.  This patch leverages function
> exprs_known_equal() to achieve that, after enhancing this function to
> consider opfamily if provided.
>
> 0005 implements the functions that generate paths for grouped relations
> by adding sorted and hashed partial aggregation paths on top of paths of
> the plain base or join relations.  For sorted partial aggregation paths,
> we only consider any suitably-sorted input paths as well as sorting the
> cheapest-total path.  For hashed partial aggregation paths, we only
> consider the cheapest-total path as input.  By not considering other
> paths we can reduce the number of grouping paths as much as possible
> while still achieving reasonable results.
>
> 0006 builds grouped relations for each base relation if possible, and
> generates aggregation paths for the grouped base relations.
>
> 0007 builds grouped relations for each just-processed join relation if
> possible, and generates aggregation paths for the grouped join
> relations.  The changes made to make_join_rel() are relatively minor,
> with the addition of a new function make_grouped_join_rel(), which finds
> or creates a grouped relation for the just-processed joinrel, and
> generates grouped paths by joining a grouped input relation with a
> non-grouped input relation.
>
> The other way to generate grouped paths is by adding sorted and hashed
> partial aggregation paths on top of paths of the joinrel.  This occurs
> in standard_join_search(), after we've run set_cheapest() for the
> joinrel.  The reason for performing this step after set_cheapest() is
> that we need to know the joinrel's cheapest paths (see 0005).
>
> This patch also makes the grouped relation for the topmost join rel act
> as the upper rel representing the result of partial aggregation, so that
> we can add the final aggregation on top of that.  Additionally, this
> patch extends the functionality of eager aggregation to work with
> partitionwise join and geqo.
>
> This patch also makes eager aggregation work with outer joins.  With
> outer join, the aggregate cannot be pushed down if any column referenced
> by grouping expressions or aggregate functions is nullable by an outer
> join above the relation to which we want to apply the partiall
> aggregation.  Thanks to Tom's outer-join-aware-Var infrastructure, we
> can easily identify such situations and subsequently refrain from
> pushing down the aggregates.
>
> Starting from this patch, you should be able to see plans with eager
> aggregation.
>
> 0008 adds test cases for eager aggregation.
>
> 0009 adds a section in README that describes this feature (copied from
> previous patch set, with minor tweaks).
>
> Thoughts and comments are welcome.
>
> [1] https://www.vldb.org/conf/1995/P345.PDF
> [2] https://www.postgresql.org/message-id/flat/9666.1491295317%40localhost
> [3]
> https://www.postgresql.org/message-id/flat/OS3PR01MB66609589B896FBDE190209F495EE9%40OS3PR01MB6660.jp...
>
> Thanks
> Richard
>


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Eager aggregation, take 3
@ 2026-05-31 11:28  Tender Wang <[email protected]>
  parent: Radim Marek <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Tender Wang @ 2026-05-31 11:28 UTC (permalink / raw)
  To: Radim Marek <[email protected]>; +Cc: Richard Guo <[email protected]>; pgsql-hackers

Radim Marek <[email protected]> 于2026年5月29日周五 23:55写道:
>
> Hey Richard,
>
> I might be out of my depth here - but while testing RegreSQL as correctness/performance harness on PostgreSQL it picked up a problem with the wrong-results case during eager aggregation.
>
> It reproduces on current HEAD (commit 2670cc298f42cd7b1c426bf7ccfb0652d8e0b347 now) with enable_eager_aggregate enabled.
>
> My testing environment
>   - Linux aarch64, gcc 12 (Debian)
>   - macOS arm64, Apple clang 21
>     (PostgreSQL 19devel on aarch64-apple-darwin25.5.0)
>
> == How to reproduce
>
>   CREATE TEMP TABLE c(id int, country text);
>   CREATE TEMP TABLE o(customer_id int);
>   INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE');
>   INSERT INTO o VALUES (1),(3);   -- only customers 1 and 3 have a row in o
>
>   SELECT c.country, count(*) AS n
>   FROM c
>   WHERE NOT EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
>   GROUP BY c.country
>   ORDER BY c.country;
>
> Expected results (everywhere except master)
>
>  country | n
> ---------+---
>  DE      | 2
>  US      | 1
> (2 rows)
>
> The actual result with enable_eager_aggregate = on (default)
>
>  country | n
> ---------+---
>  DE      | 0
>  US      | 0
> (2 rows)
>
> With SET enable_eager_aggregate = off, the result is correct (DE=2, US=1), as it is on PG18.
>
> Query Plan
>
>                                                             QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------
>  Sort  (cost=108.19..108.69 rows=200 width=40) (actual time=0.195..0.197 rows=2.00 loops=1)
>    Sort Key: c.country
>    Sort Method: quicksort  Memory: 25kB
>    Buffers: local hit=2
>    ->  Finalize HashAggregate  (cost=98.55..100.55 rows=200 width=40) (actual time=0.183..0.186 rows=2.00 loops=1)
>          Group Key: c.country
>          Batches: 1  Memory Usage: 32kB
>          Buffers: local hit=2
>          ->  Hash Anti Join  (cost=52.75..95.37 rows=635 width=40) (actual time=0.177..0.179 rows=3.00 loops=1)
>                Hash Cond: (c.id = o.customer_id)
>                Buffers: local hit=2
>                ->  Seq Scan on c  (cost=0.00..22.70 rows=1270 width=36) (actual time=0.024..0.025 rows=5.00 loops=1)
>                      Buffers: local hit=1
>                ->  Hash  (cost=50.25..50.25 rows=200 width=12) (actual time=0.145..0.146 rows=2.00 loops=1)
>                      Buckets: 1024  Batches: 1  Memory Usage: 9kB
>                      Buffers: local hit=1
>                      ->  Partial HashAggregate  (cost=48.25..50.25 rows=200 width=12) (actual time=0.122..0.123 rows=2.00 loops=1)
>                            Group Key: o.customer_id
>                            Batches: 1  Memory Usage: 32kB
>                            Buffers: local hit=1
>                            ->  Seq Scan on o  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.002..0.003 rows=2.00 loops=1)
>                                  Buffers: local hit=1
>  Planning Time: 0.294 ms
>  Execution Time: 0.255 ms
> (24 rows)
>
> If this is already known or in progress, apologies for the noise.
Thanks for the report. This is a bug.
When we use eager_agg, it can reduce many tuples before doing a join
on the partial agg side.
After partial agg, when we are doing a join,  the matched rows will be
significantly reduced.
This is also the effect we want to achieve from eager_agg.

But we should be careful about anti-join. Because we will ignore the
matched row. The aggregate of unmatched rows seems wrong.
And I can get the wrong results from the semi-join, too.
For example:
postgres=# CREATE TEMP TABLE c(id int, country text);
CREATE TEMP TABLE o(customer_id int);
INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE');
INSERT INTO o VALUES (1),(3);
CREATE TABLE
CREATE TABLE
INSERT 0 5
INSERT 0 2
postgres=# insert into o values (1);
INSERT 0 1
-- correct result
postgres=#  SELECT c.country, count(*) AS n
FROM c
WHERE EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
GROUP BY c.country
ORDER BY c.country;
 country | n
---------+---
 DE      | 1
 US      | 1
(2 rows)

I do some hacks that make the cost of the path created in
make_grouped_join_rel() very small.
So we can get a partial agg plan, as follow:

postgres=# explain SELECT c.country, count(*) AS n
FROM c
WHERE EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
GROUP BY c.country
ORDER BY c.country;
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Finalize GroupAggregate  (cost=31.56..38.32 rows=200 width=40)
   Group Key: c.country
   ->  Sort  (cost=31.56..33.15 rows=635 width=40)
         Sort Key: c.country
         ->  Hash Semi Join  (cost=1.00..2.00 rows=635 width=40)
               Hash Cond: (c.id = o.customer_id)
               ->  Seq Scan on c  (cost=0.00..22.70 rows=1270 width=36)
               ->  Hash  (cost=200.91..200.91 rows=200 width=12)
                     ->  Partial GroupAggregate  (cost=179.78..200.91
rows=200 width=12)
                           Group Key: o.customer_id
                           ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
                                 Sort Key: o.customer_id
                                 ->  Seq Scan on o  (cost=0.00..35.50
rows=2550 width=4)
(13 rows)

postgres=#  SELECT c.country, count(*) AS n
FROM c
WHERE EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
GROUP BY c.country
ORDER BY c.country;
 country | n
---------+---
 DE      | 1
 US      | 2
(2 rows)

You can see that the count(us) has 2. Because partial agg
pre-aggregates the results for country =1.
However, for the semantics of semi-join, it returns once a match is found.

I haven't thought about it too deeply yet. Maybe we can do something
in the make_grouped_join_rel().
...
if (sjinfo->jointype == JOIN_ANTI || sjinfo->jointype == JOIN_SEMI)
    return;
...
The fixes above can temporarily resolve these issues. But it seems too strict.

-- 
Thanks,
Tender Wang






^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Eager aggregation, take 3
@ 2026-06-01 07:19  Richard Guo <[email protected]>
  parent: Tender Wang <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Richard Guo @ 2026-06-01 07:19 UTC (permalink / raw)
  To: Tender Wang <[email protected]>; +Cc: Radim Marek <[email protected]>; pgsql-hackers

On Sun, May 31, 2026 at 8:28 PM Tender Wang <[email protected]> wrote:
> Radim Marek <[email protected]> 于2026年5月29日周五 23:55写道:
> > == How to reproduce
> >
> >   CREATE TEMP TABLE c(id int, country text);
> >   CREATE TEMP TABLE o(customer_id int);
> >   INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE');
> >   INSERT INTO o VALUES (1),(3);   -- only customers 1 and 3 have a row in o
> >
> >   SELECT c.country, count(*) AS n
> >   FROM c
> >   WHERE NOT EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
> >   GROUP BY c.country
> >   ORDER BY c.country;
> >
> > Expected results (everywhere except master)
> >
> >  country | n
> > ---------+---
> >  DE      | 2
> >  US      | 1
> > (2 rows)
> >
> > The actual result with enable_eager_aggregate = on (default)
> >
> >  country | n
> > ---------+---
> >  DE      | 0
> >  US      | 0
> > (2 rows)

Thanks for the report.  This is a bug.  We should never push a partial
aggregation down to a relation on the inner (RHS) side of a semi/anti
join.  A semi/anti join does not preserve its inner rows in the join
output, so a partial aggregate computed on the inner side would not
survive the join and could not be combined by the final aggregation.

> I haven't thought about it too deeply yet. Maybe we can do something
> in the make_grouped_join_rel().
> ...
> if (sjinfo->jointype == JOIN_ANTI || sjinfo->jointype == JOIN_SEMI)
>     return;
> ...

That does fix the reported case, but I think it's too broad: it also
disables pushing a partial aggregate to the outer side of a semi/anti
join, which is valid.  And by the time we reach make_grouped_join_rel
the grouped relation for the inner-side relation has already been
built, so it would just go unused.

So I'd rather fix it in eager_aggregation_possible_for_relation, right
next to the existing outer-join check, by rejecting a relation that
lies on the inner side of a semijoin/antijoin.  See attached.

- Richard


Attachments:

  [application/octet-stream] v1-0001-Fix-eager-aggregation-for-semi-antijoin-inner-rel.patch (8.9K, 2-v1-0001-Fix-eager-aggregation-for-semi-antijoin-inner-rel.patch)
  download | inline diff:
From f243991b43614e56ef2be0bd5f0c92f807cdde3e Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Mon, 1 Jun 2026 14:49:33 +0900
Subject: [PATCH v1] Fix eager aggregation for semi/antijoin inner rels

Eager aggregation pushes a partial aggregate down to a base or join
relation, to be finalized after that relation is joined with the rest
of the query.  eager_aggregation_possible_for_relation() already
refuses to do this for a relation on the nullable side of an outer
join, but it failed to also refuse it for a relation on the inner side
of a semijoin or antijoin.

Such a join does not emit its inner rows, so a partial aggregate
computed on the inner side does not survive the join and cannot be
combined by the final aggregation.  This can happen only for an
aggregate that references no table column, such as count(*): it is
considered computable on any relation, including the inner one,
whereas an aggregate that references a column is anchored to the outer
side and never reaches the inner relation.

The existing outer-join check did not catch this because it consults
nulling_relids, which only tracks joins that null-extend their inner
side.  Semijoins and antijoins formed from EXISTS, IN, NOT EXISTS, or
NOT IN sublinks do not null-extend and carry no ojrelid, so they are
invisible to that check.

Fix by additionally rejecting any relation that includes inner-side
relations of a semijoin or antijoin but not the join's outer side.
Pushing a partial aggregate to the outer side of such a join, grouped
by the join key, remains valid and is still allowed.
---
 src/backend/optimizer/README                  | 11 +++
 src/backend/optimizer/util/relnode.c          | 26 ++++++
 src/test/regress/expected/eager_aggregate.out | 90 +++++++++++++++++++
 src/test/regress/sql/eager_aggregate.sql      | 26 ++++++
 4 files changed, 153 insertions(+)

diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 6c35baceedb..78a307cc523 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1588,6 +1588,17 @@ aggregation.  Pushing partial aggregation in this case may result in
 the rows being grouped differently than expected, or produce incorrect
 values from the aggregate functions.
 
+Semi joins and anti joins impose a similar restriction.  Such a join
+does not preserve its inner rows in the join output, so a partial
+aggregate computed on the inner side would not survive the join and
+could not be combined by the final aggregation.  We therefore do not
+push partial aggregation down to the inner side of a semi/anti join.
+(An anti join reduced from an outer join null-extends its inner side,
+so that inner relation is already excluded by the outer-join condition
+above; the case specifically addressed here is a semi/anti join that
+does not null-extend its inner side, such as one formed from an
+EXISTS, IN, NOT EXISTS, or NOT IN sublink.)
+
 During the construction of the join tree, we evaluate each base or
 join relation to determine if eager aggregation can be applied.  If
 feasible, we create a separate RelOptInfo called a "grouped relation"
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3fc2c2f71d0..687e923c46c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -2845,6 +2845,32 @@ eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
 			return false;
 	}
 
+	/*
+	 * Similarly, we cannot push a partial aggregation down to a relation on
+	 * the inner (RHS) side of a semi/anti join.  A semi/anti join does not
+	 * preserve its inner rows in the join output, so a partial aggregate
+	 * computed on the inner side would not survive the join and could not be
+	 * combined by the final aggregation.
+	 *
+	 * Note that an anti join reduced from an outer join null-extends its
+	 * inner side, so that inner relation already carries nulling_relids and
+	 * is handled by the outer-join check above.  The case this check adds is
+	 * a semi/anti join that does not null-extend its inner side, such as one
+	 * formed from an EXISTS, IN, NOT EXISTS, or NOT IN sublink.
+	 */
+	foreach(lc, root->join_info_list)
+	{
+		SpecialJoinInfo *sjinfo = lfirst_node(SpecialJoinInfo, lc);
+
+		if (sjinfo->jointype != JOIN_SEMI && sjinfo->jointype != JOIN_ANTI)
+			continue;
+
+		/* rel includes inner-side rels of this join but not its outer side */
+		if (bms_overlap(rel->relids, sjinfo->min_righthand) &&
+			!bms_is_subset(sjinfo->min_lefthand, rel->relids))
+			return false;
+	}
+
 	/*
 	 * For now we don't try to support PlaceHolderVars.
 	 */
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
index 456d32eb13d..091ae48a92b 100644
--- a/src/test/regress/expected/eager_aggregate.out
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -466,6 +466,96 @@ GROUP BY t1.a ORDER BY t1.a;
                      ->  Seq Scan on eager_agg_t1 t1
 (9 rows)
 
+-- Eager aggregation must not push a partial aggregate onto the inner side of a
+-- SEMI or ANTI join
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, count(*)
+  FROM eager_agg_t2 t2
+  WHERE NOT EXISTS (SELECT 1 FROM eager_agg_t3 t3 WHERE t3.a = t2.a)
+GROUP BY t2.b ORDER BY t2.b;
+                         QUERY PLAN                         
+------------------------------------------------------------
+ Sort
+   Output: t2.b, (count(*))
+   Sort Key: t2.b
+   ->  HashAggregate
+         Output: t2.b, count(*)
+         Group Key: t2.b
+         ->  Hash Anti Join
+               Output: t2.b
+               Hash Cond: (t2.a = t3.a)
+               ->  Seq Scan on public.eager_agg_t2 t2
+                     Output: t2.a, t2.b, t2.c
+               ->  Hash
+                     Output: t3.a
+                     ->  Seq Scan on public.eager_agg_t3 t3
+                           Output: t3.a
+(15 rows)
+
+SELECT t2.b, count(*)
+  FROM eager_agg_t2 t2
+  WHERE NOT EXISTS (SELECT 1 FROM eager_agg_t3 t3 WHERE t3.a = t2.a)
+GROUP BY t2.b ORDER BY t2.b;
+ b | count 
+---+-------
+ 0 |   100
+ 1 |    99
+ 2 |    99
+ 3 |    99
+ 4 |    99
+ 5 |    99
+ 6 |    99
+ 7 |    99
+ 8 |    99
+ 9 |    99
+(10 rows)
+
+-- Eager aggregation may still push a partial aggregate onto the outer side of
+-- a SEMI or ANTI join
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, count(*)
+  FROM eager_agg_t2 t2
+  WHERE EXISTS (SELECT 1 FROM eager_agg_t1 t1 WHERE t1.b = t2.b)
+GROUP BY t2.b ORDER BY t2.b;
+                            QUERY PLAN                            
+------------------------------------------------------------------
+ Finalize GroupAggregate
+   Output: t2.b, count(*)
+   Group Key: t2.b
+   ->  Sort
+         Output: t2.b, (PARTIAL count(*))
+         Sort Key: t2.b
+         ->  Hash Right Semi Join
+               Output: t2.b, (PARTIAL count(*))
+               Hash Cond: (t1.b = t2.b)
+               ->  Seq Scan on public.eager_agg_t1 t1
+                     Output: t1.a, t1.b, t1.c
+               ->  Hash
+                     Output: t2.b, (PARTIAL count(*))
+                     ->  Partial HashAggregate
+                           Output: t2.b, PARTIAL count(*)
+                           Group Key: t2.b
+                           ->  Seq Scan on public.eager_agg_t2 t2
+                                 Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t2.b, count(*)
+  FROM eager_agg_t2 t2
+  WHERE EXISTS (SELECT 1 FROM eager_agg_t1 t1 WHERE t1.b = t2.b)
+GROUP BY t2.b ORDER BY t2.b;
+ b | count 
+---+-------
+ 1 |   100
+ 2 |   100
+ 3 |   100
+ 4 |   100
+ 5 |   100
+ 6 |   100
+ 7 |   100
+ 8 |   100
+ 9 |   100
+(9 rows)
+
 DROP TABLE eager_agg_t1;
 DROP TABLE eager_agg_t2;
 DROP TABLE eager_agg_t3;
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
index 53d9b377a64..7bca9c524da 100644
--- a/src/test/regress/sql/eager_aggregate.sql
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -177,6 +177,32 @@ SELECT t1.a, avg(t2.c) FILTER (WHERE random() > 0.5)
   JOIN eager_agg_t2 t2 ON t1.b = t2.b
 GROUP BY t1.a ORDER BY t1.a;
 
+-- Eager aggregation must not push a partial aggregate onto the inner side of a
+-- SEMI or ANTI join
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, count(*)
+  FROM eager_agg_t2 t2
+  WHERE NOT EXISTS (SELECT 1 FROM eager_agg_t3 t3 WHERE t3.a = t2.a)
+GROUP BY t2.b ORDER BY t2.b;
+
+SELECT t2.b, count(*)
+  FROM eager_agg_t2 t2
+  WHERE NOT EXISTS (SELECT 1 FROM eager_agg_t3 t3 WHERE t3.a = t2.a)
+GROUP BY t2.b ORDER BY t2.b;
+
+-- Eager aggregation may still push a partial aggregate onto the outer side of
+-- a SEMI or ANTI join
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, count(*)
+  FROM eager_agg_t2 t2
+  WHERE EXISTS (SELECT 1 FROM eager_agg_t1 t1 WHERE t1.b = t2.b)
+GROUP BY t2.b ORDER BY t2.b;
+
+SELECT t2.b, count(*)
+  FROM eager_agg_t2 t2
+  WHERE EXISTS (SELECT 1 FROM eager_agg_t1 t1 WHERE t1.b = t2.b)
+GROUP BY t2.b ORDER BY t2.b;
+
 DROP TABLE eager_agg_t1;
 DROP TABLE eager_agg_t2;
 DROP TABLE eager_agg_t3;
-- 
2.39.5 (Apple Git-154)



^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Eager aggregation, take 3
@ 2026-06-01 07:57  Tender Wang <[email protected]>
  parent: Richard Guo <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Tender Wang @ 2026-06-01 07:57 UTC (permalink / raw)
  To: Richard Guo <[email protected]>; +Cc: Radim Marek <[email protected]>; pgsql-hackers

Richard Guo <[email protected]> 于2026年6月1日周一 15:19写道:
>
> Thanks for the report.  This is a bug.  We should never push a partial
> aggregation down to a relation on the inner (RHS) side of a semi/anti
> join.  A semi/anti join does not preserve its inner rows in the join
> output, so a partial aggregate computed on the inner side would not
> survive the join and could not be combined by the final aggregation.
>
> > I haven't thought about it too deeply yet. Maybe we can do something
> > in the make_grouped_join_rel().
> > ...
> > if (sjinfo->jointype == JOIN_ANTI || sjinfo->jointype == JOIN_SEMI)
> >     return;
> > ...
>
> That does fix the reported case, but I think it's too broad: it also
> disables pushing a partial aggregate to the outer side of a semi/anti
> join, which is valid.  And by the time we reach make_grouped_join_rel
> the grouped relation for the inner-side relation has already been
> built, so it would just go unused.

Yes, checking only the jointype and concluding that partial agg is not allowed
will cause us to miss some optimization opportunities for the outer
side of the semi or anti-join.


> So I'd rather fix it in eager_aggregation_possible_for_relation, right
> next to the existing outer-join check, by rejecting a relation that
> lies on the inner side of a semijoin/antijoin.  See attached.

The attached LGTM.



-- 
Thanks,
Tender Wang






^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Eager aggregation, take 3
@ 2026-06-03 01:25  Richard Guo <[email protected]>
  parent: Tender Wang <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Richard Guo @ 2026-06-03 01:25 UTC (permalink / raw)
  To: Tender Wang <[email protected]>; +Cc: Radim Marek <[email protected]>; pgsql-hackers

On Mon, Jun 1, 2026 at 4:57 PM Tender Wang <[email protected]> wrote:
> Richard Guo <[email protected]> 于2026年6月1日周一 15:19写道:
> > So I'd rather fix it in eager_aggregation_possible_for_relation, right
> > next to the existing outer-join check, by rejecting a relation that
> > lies on the inner side of a semijoin/antijoin.  See attached.

> The attached LGTM.

Thanks for the review.  Pushed.

And thanks to Radim for the report and the well-contained repro.

- Richard






^ permalink  raw  reply  [nested|flat] 6+ messages in thread


end of thread, other threads:[~2026-06-03 01:25 UTC | newest]

Thread overview: 6+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-03-04 08:27 Eager aggregation, take 3 Richard Guo <[email protected]>
2026-05-29 15:55 ` Radim Marek <[email protected]>
2026-05-31 11:28   ` Tender Wang <[email protected]>
2026-06-01 07:19     ` Richard Guo <[email protected]>
2026-06-01 07:57       ` Tender Wang <[email protected]>
2026-06-03 01:25         ` Richard Guo <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox