public inbox for [email protected]
help / color / mirror / Atom feedEager aggregation, take 3
55+ messages / 8 participants
[nested] [flat]
* Eager aggregation, take 3
@ 2024-03-04 08:27 Richard Guo <[email protected]>
0 siblings, 3 replies; 55+ messages in thread
From: Richard Guo @ 2024-03-04 08:27 UTC (permalink / raw)
To: pgsql-hackers
Hi All,
Eager aggregation is a query optimization technique that partially
pushes a group-by past a join, and finalizes it once all the relations
are joined. Eager aggregation reduces the number of input rows to the
join and thus may result in a better overall plan. This technique is
thoroughly described in the 'Eager Aggregation and Lazy Aggregation'
paper [1].
Back in 2017, a patch set has been proposed by Antonin Houska to
implement eager aggregation in thread [2]. However, it was at last
withdrawn after entering the pattern of "please rebase thx" followed by
rebasing and getting no feedback until "please rebase again thx". A
second attempt in 2022 unfortunately fell into the same pattern about
one year ago and was eventually closed again [3].
That patch set has included most of the necessary concepts to implement
eager aggregation. However, as far as I can see, it has several weak
points that we need to address. It introduces invasive changes to some
core planner functions, such as make_join_rel(). And with such changes
join_is_legal() would be performed three times for the same proposed
join, which is not great. Another weak point is that the complexity of
join searching dramatically increases with the growing number of
relations to be joined. This occurs because when we generate partially
aggregated paths, each path of the input relation is considered as an
input path for the grouped paths. As a result, the number of grouped
paths we generate increases exponentially, leading to a significant
explosion in computational complexity. Other weak points include the
lack of support for outer joins and partitionwise joins. And during my
review of the code, I came across several bugs (planning error or crash)
that need to be addressed.
I'd like to give it another take to implement eager aggregation, while
borrowing lots of concepts and many chunks of codes from the previous
patch set. Please see attached. I have chosen to use the term 'Eager
Aggregation' from the paper [1] instead of 'Aggregation push-down', to
differentiate the aggregation push-down technique in FDW.
The patch has been split into small pieces to make the review easier.
0001 introduces the RelInfoList structure, which encapsulates both a
list and a hash table, so that we can leverage the hash table for faster
lookups not only for join relations but also for upper relations. With
eager aggregation, it is possible that we generate so many upper rels of
type UPPERREL_PARTIAL_GROUP_AGG that a hash table can help a lot with
lookups.
0002 introduces the RelAggInfo structure to store information needed to
create grouped paths for base and join rels. It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.
0003 checks if eager aggregation is applicable, and if so, collects
suitable aggregate expressions and grouping expressions in the query,
and records them in root->agg_clause_list and root->group_expr_list
respectively.
0004 implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier. In this patch, when we check
if a target expression can act as grouping expression, we need to check
if this expression can be known equal to other expressions due to ECs
that can act as grouping expressions. This patch leverages function
exprs_known_equal() to achieve that, after enhancing this function to
consider opfamily if provided.
0005 implements the functions that generate paths for grouped relations
by adding sorted and hashed partial aggregation paths on top of paths of
the plain base or join relations. For sorted partial aggregation paths,
we only consider any suitably-sorted input paths as well as sorting the
cheapest-total path. For hashed partial aggregation paths, we only
consider the cheapest-total path as input. By not considering other
paths we can reduce the number of grouping paths as much as possible
while still achieving reasonable results.
0006 builds grouped relations for each base relation if possible, and
generates aggregation paths for the grouped base relations.
0007 builds grouped relations for each just-processed join relation if
possible, and generates aggregation paths for the grouped join
relations. The changes made to make_join_rel() are relatively minor,
with the addition of a new function make_grouped_join_rel(), which finds
or creates a grouped relation for the just-processed joinrel, and
generates grouped paths by joining a grouped input relation with a
non-grouped input relation.
The other way to generate grouped paths is by adding sorted and hashed
partial aggregation paths on top of paths of the joinrel. This occurs
in standard_join_search(), after we've run set_cheapest() for the
joinrel. The reason for performing this step after set_cheapest() is
that we need to know the joinrel's cheapest paths (see 0005).
This patch also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that. Additionally, this
patch extends the functionality of eager aggregation to work with
partitionwise join and geqo.
This patch also makes eager aggregation work with outer joins. With
outer join, the aggregate cannot be pushed down if any column referenced
by grouping expressions or aggregate functions is nullable by an outer
join above the relation to which we want to apply the partiall
aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
can easily identify such situations and subsequently refrain from
pushing down the aggregates.
Starting from this patch, you should be able to see plans with eager
aggregation.
0008 adds test cases for eager aggregation.
0009 adds a section in README that describes this feature (copied from
previous patch set, with minor tweaks).
Thoughts and comments are welcome.
[1] https://www.vldb.org/conf/1995/P345.PDF
[2] https://www.postgresql.org/message-id/flat/9666.1491295317%40localhost
[3]
https://www.postgresql.org/message-id/flat/OS3PR01MB66609589B896FBDE190209F495EE9%40OS3PR01MB6660.jp...
Thanks
Richard
Attachments:
[application/octet-stream] v1-0001-Introduce-RelInfoList-structure.patch (14.3K, 3-v1-0001-Introduce-RelInfoList-structure.patch)
download | inline diff:
From 542f02eb98b84dad9990c03bef792bb3e816fd23 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Mon, 19 Feb 2024 15:16:51 +0800
Subject: [PATCH v1 1/9] Introduce RelInfoList structure
This commit introduces the RelInfoList structure, which encapsulates
both a list and a hash table, so that we can leverage the hash table for
faster lookups not only for join relations but also for upper relations.
---
contrib/postgres_fdw/postgres_fdw.c | 3 +-
src/backend/optimizer/geqo/geqo_eval.c | 20 +--
src/backend/optimizer/path/allpaths.c | 7 +-
src/backend/optimizer/plan/planmain.c | 5 +-
src/backend/optimizer/util/relnode.c | 164 ++++++++++++++-----------
src/include/nodes/pathnodes.h | 31 +++--
6 files changed, 133 insertions(+), 97 deletions(-)
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 142dcfc995..f46fc604b4 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6069,7 +6069,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype,
*/
Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */
fpinfo->relation_index =
- list_length(root->parse->rtable) + list_length(root->join_rel_list);
+ list_length(root->parse->rtable) +
+ list_length(root->join_rel_list->items);
return true;
}
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index d2f7f4e5f3..1141156899 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* truncating the list to its original length. NOTE this assumes that any
* added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_hash, if there
- * is one. We can do this by just temporarily setting the link to NULL.
- * (If we are dealing with enough join rels, which we very likely are, a
- * new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer join_rel_list->hash, if
+ * there is one. We can do this by just temporarily setting the link to
+ * NULL. (If we are dealing with enough join rels, which we very likely
+ * are, a new hash table will get built and used locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list);
- savehash = root->join_rel_hash;
+ savelength = list_length(root->join_rel_list->items);
+ savehash = root->join_rel_list->hash;
Assert(root->join_rel_level == NULL);
- root->join_rel_hash = NULL;
+ root->join_rel_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
* Restore join_rel_list to its former state, and put back original
* hashtable if any.
*/
- root->join_rel_list = list_truncate(root->join_rel_list,
- savelength);
- root->join_rel_hash = savehash;
+ root->join_rel_list->items = list_truncate(root->join_rel_list->items,
+ savelength);
+ root->join_rel_list->hash = savehash;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d404fbf262..351bf2e9e4 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3413,9 +3413,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist)
* needed for these paths need have been instantiated.
*
* Note to plugin authors: the functions invoked during standard_join_search()
- * modify root->join_rel_list and root->join_rel_hash. If you want to do more
- * than one join-order search, you'll probably need to save and restore the
- * original states of those data structures. See geqo_eval() for an example.
+ * modify root->join_rel_list->items and root->join_rel_list->hash. If you
+ * want to do more than one join-order search, you'll probably need to save and
+ * restore the original states of those data structures. See geqo_eval() for
+ * an example.
*/
RelOptInfo *
standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index ca47c7d310..3341e64d2b 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -65,8 +65,9 @@ query_planner(PlannerInfo *root,
* NOTE: append_rel_list was set up by subquery_planner, so do not touch
* here.
*/
- root->join_rel_list = NIL;
- root->join_rel_hash = NULL;
+ root->join_rel_list = makeNode(RelInfoList);
+ root->join_rel_list->items = NIL;
+ root->join_rel_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e5f4062bfb..9e25750acd 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -35,11 +35,15 @@
#include "utils/lsyscache.h"
-typedef struct JoinHashEntry
+/*
+ * An entry of a hash table that we use to make lookup for RelOptInfo
+ * structures more efficient.
+ */
+typedef struct RelInfoEntry
{
- Relids join_relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *join_rel;
-} JoinHashEntry;
+ Relids relids; /* hash key --- MUST BE FIRST */
+ RelOptInfo *rel;
+} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
RelOptInfo *input_rel,
@@ -472,11 +476,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
}
/*
- * build_join_rel_hash
- * Construct the auxiliary hash table for join relations.
+ * build_rel_hash
+ * Construct the auxiliary hash table for relations.
*/
static void
-build_join_rel_hash(PlannerInfo *root)
+build_rel_hash(RelInfoList *list)
{
HTAB *hashtab;
HASHCTL hash_ctl;
@@ -484,47 +488,49 @@ build_join_rel_hash(PlannerInfo *root)
/* Create the hash table */
hash_ctl.keysize = sizeof(Relids);
- hash_ctl.entrysize = sizeof(JoinHashEntry);
+ hash_ctl.entrysize = sizeof(RelInfoEntry);
hash_ctl.hash = bitmap_hash;
hash_ctl.match = bitmap_match;
hash_ctl.hcxt = CurrentMemoryContext;
- hashtab = hash_create("JoinRelHashTable",
+ hashtab = hash_create("RelHashTable",
256L,
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing joinrels */
- foreach(l, root->join_rel_list)
+ /* Insert all the already-existing relations */
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
bool found;
- hentry = (JoinHashEntry *) hash_search(hashtab,
- &(rel->relids),
- HASH_ENTER,
- &found);
+ hentry = (RelInfoEntry *) hash_search(hashtab,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
Assert(!found);
- hentry->join_rel = rel;
+ hentry->rel = rel;
}
- root->join_rel_hash = hashtab;
+ list->hash = hashtab;
}
/*
- * find_join_rel
- * Returns relation entry corresponding to 'relids' (a set of RT indexes),
- * or NULL if none exists. This is for join relations.
+ * find_rel_info
+ * Find an RelOptInfo entry.
*/
-RelOptInfo *
-find_join_rel(PlannerInfo *root, Relids relids)
+static RelOptInfo *
+find_rel_info(RelInfoList *list, Relids relids)
{
+ if (list == NULL)
+ return NULL;
+
/*
* Switch to using hash lookup when list grows "too long". The threshold
* is arbitrary and is known only here.
*/
- if (!root->join_rel_hash && list_length(root->join_rel_list) > 32)
- build_join_rel_hash(root);
+ if (!list->hash && list_length(list->items) > 32)
+ build_rel_hash(list);
/*
* Use either hashtable lookup or linear search, as appropriate.
@@ -534,23 +540,23 @@ find_join_rel(PlannerInfo *root, Relids relids)
* so would force relids out of a register and thus probably slow down the
* list-search case.
*/
- if (root->join_rel_hash)
+ if (list->hash)
{
Relids hashkey = relids;
- JoinHashEntry *hentry;
+ RelInfoEntry *hentry;
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &hashkey,
- HASH_FIND,
- NULL);
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &hashkey,
+ HASH_FIND,
+ NULL);
if (hentry)
- return hentry->join_rel;
+ return hentry->rel;
}
else
{
ListCell *l;
- foreach(l, root->join_rel_list)
+ foreach(l, list->items)
{
RelOptInfo *rel = (RelOptInfo *) lfirst(l);
@@ -562,6 +568,54 @@ find_join_rel(PlannerInfo *root, Relids relids)
return NULL;
}
+/*
+ * find_join_rel
+ * Returns relation entry corresponding to 'relids' (a set of RT indexes),
+ * or NULL if none exists. This is for join relations.
+ */
+RelOptInfo *
+find_join_rel(PlannerInfo *root, Relids relids)
+{
+ return find_rel_info(root->join_rel_list, relids);
+}
+
+/*
+ * add_rel_info
+ * Add given relation to the given list. Also add it to the auxiliary
+ * hashtable if there is one.
+ */
+static void
+add_rel_info(RelInfoList *list, RelOptInfo *rel)
+{
+ /* GEQO requires us to append the new relation to the end of the list! */
+ list->items = lappend(list->items, rel);
+
+ /* store it into the auxiliary hashtable if there is one. */
+ if (list->hash)
+ {
+ RelInfoEntry *hentry;
+ bool found;
+
+ hentry = (RelInfoEntry *) hash_search(list->hash,
+ &(rel->relids),
+ HASH_ENTER,
+ &found);
+ Assert(!found);
+ hentry->rel = rel;
+ }
+}
+
+/*
+ * add_join_rel
+ * Add given join relation to the list of join relations in the given
+ * PlannerInfo.
+ */
+static void
+add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
+{
+ add_rel_info(root->join_rel_list, joinrel);
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -611,32 +665,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel,
}
}
-/*
- * add_join_rel
- * Add given join relation to the list of join relations in the given
- * PlannerInfo. Also add it to the auxiliary hashtable if there is one.
- */
-static void
-add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
-{
- /* GEQO requires us to append the new joinrel to the end of the list! */
- root->join_rel_list = lappend(root->join_rel_list, joinrel);
-
- /* store it into the auxiliary hashtable if there is one. */
- if (root->join_rel_hash)
- {
- JoinHashEntry *hentry;
- bool found;
-
- hentry = (JoinHashEntry *) hash_search(root->join_rel_hash,
- &(joinrel->relids),
- HASH_ENTER,
- &found);
- Assert(!found);
- hentry->join_rel = joinrel;
- }
-}
-
/*
* build_join_rel
* Returns relation entry corresponding to the union of two given rels,
@@ -1462,22 +1490,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel,
RelOptInfo *
fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
{
+ RelInfoList *list = &root->upper_rels[kind];
RelOptInfo *upperrel;
- ListCell *lc;
-
- /*
- * For the moment, our indexing data structure is just a List for each
- * relation kind. If we ever get so many of one kind that this stops
- * working well, we can improve it. No code outside this function should
- * assume anything about how to find a particular upperrel.
- */
/* If we already made this upperrel for the query, return it */
- foreach(lc, root->upper_rels[kind])
+ if (list)
{
- upperrel = (RelOptInfo *) lfirst(lc);
-
- if (bms_equal(upperrel->relids, relids))
+ upperrel = find_rel_info(list, relids);
+ if (upperrel)
return upperrel;
}
@@ -1496,7 +1516,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
upperrel->cheapest_unique_path = NULL;
upperrel->cheapest_parameterized_paths = NIL;
- root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel);
+ add_rel_info(&root->upper_rels[kind], upperrel);
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 534692bee1..be51e2c652 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -80,6 +80,25 @@ typedef enum UpperRelationKind
/* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */
} UpperRelationKind;
+/*
+ * Hashed list to store relation specific info and to retrieve it by relids.
+ *
+ * For small problems we just scan the list to do lookups, but when there are
+ * many relations we build a hash table for faster lookups. The hash table is
+ * present and valid when 'hash' is not NULL. Note that we still maintain the
+ * list even when using the hash table for lookups; this simplifies life for
+ * GEQO.
+ */
+typedef struct RelInfoList
+{
+ pg_node_attr(no_copy_equal, no_read)
+
+ NodeTag type;
+
+ List *items;
+ struct HTAB *hash pg_node_attr(read_write_ignore);
+} RelInfoList;
+
/*----------
* PlannerGlobal
* Global information for planning/optimization
@@ -267,15 +286,9 @@ struct PlannerInfo
/*
* join_rel_list is a list of all join-relation RelOptInfos we have
- * considered in this planning run. For small problems we just scan the
- * list to do lookups, but when there are many join relations we build a
- * hash table for faster lookups. The hash table is present and valid
- * when join_rel_hash is not NULL. Note that we still maintain the list
- * even when using the hash table for lookups; this simplifies life for
- * GEQO.
+ * considered in this planning run.
*/
- List *join_rel_list;
- struct HTAB *join_rel_hash pg_node_attr(read_write_ignore);
+ RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */
/*
* When doing a dynamic-programming-style join search, join_rel_level[k]
@@ -408,7 +421,7 @@ struct PlannerInfo
* Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular
* upper rel.
*/
- List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
+ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
--
2.31.0
[application/octet-stream] v1-0005-Implement-functions-that-generate-paths-for-grouped-relations.patch (13.1K, 4-v1-0005-Implement-functions-that-generate-paths-for-grouped-relations.patch)
download | inline diff:
From 6b3b7a944bbb018e77dd8e4b787b9c660a9ed69b Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 14:19:39 +0800
Subject: [PATCH v1 5/9] Implement functions that generate paths for grouped
relations
This commit implements the functions that generate paths for grouped
relations by adding sorted and hashed partial aggregation paths on top
of paths of the plain base or join relations.
---
src/backend/optimizer/path/allpaths.c | 307 ++++++++++++++++++++++++++
src/backend/optimizer/util/pathnode.c | 12 +-
src/include/optimizer/paths.h | 4 +
3 files changed, 315 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 9384c54ed9..f47ad04846 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -41,6 +41,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
@@ -50,6 +51,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -3306,6 +3308,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the plain base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel_plain))
+ {
+ mark_dummy_rel(rel_grouped);
+ return;
+ }
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations of
+ * grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel_plain->pathlist != NIL)
+ {
+ cheapest_total_path = rel_plain->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for rel_grouped, then we should consider
+ * generating partially-grouped partial paths. However, if the plain rel
+ * has no partial paths, then we can't.
+ */
+ if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel_plain->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel_plain->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel_plain->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ rel_grouped,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(rel_grouped, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from the non-grouped relation which is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for the partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ rel_grouped,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ rel_grouped,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(rel_grouped, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 6f79b2e3fe..ee455f7ec2 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2699,8 +2699,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -2952,8 +2951,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -2999,8 +2997,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3158,8 +3155,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index dcea10888b..68fc05432c 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -58,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
--
2.31.0
[application/octet-stream] v1-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patch (26.1K, 5-v1-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patch)
download | inline diff:
From a7658376eb1461132627825f4deabb73a4e53d1d Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 11:27:49 +0800
Subject: [PATCH v1 4/9] Implement functions that create RelAggInfos if
applicable
This commit implements the functions that check if eager aggregation is
applicable for a given relation, and if so, create RelAggInfo structure
for the relation, using the infos about aggregate expressions and
grouping expressions we collected earlier.
---
src/backend/optimizer/path/equivclass.c | 26 +-
src/backend/optimizer/plan/planmain.c | 3 +
src/backend/optimizer/util/relnode.c | 624 ++++++++++++++++++++++++
src/backend/utils/adt/selfuncs.c | 5 +-
src/include/nodes/pathnodes.h | 6 +
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 3 +-
7 files changed, 662 insertions(+), 10 deletions(-)
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 4bd60a09c6..1890dbb852 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -2439,15 +2439,17 @@ find_join_domain(PlannerInfo *root, Relids relids)
* Detect whether two expressions are known equal due to equivalence
* relationships.
*
- * Actually, this only shows that the expressions are equal according
- * to some opfamily's notion of equality --- but we only use it for
- * selectivity estimation, so a fuzzy idea of equality is OK.
+ * If opfamily is given, the expressions must be known equal per the semantics
+ * of that opfamily (note it has to be a btree opfamily, since those are the
+ * only opfamilies equivclass.c deals with). If opfamily is InvalidOid, we'll
+ * return true if they're equal according to any opfamily, which is fuzzy but
+ * OK for estimation purposes.
*
* Note: does not bother to check for "equal(item1, item2)"; caller must
* check that case if it's possible to pass identical items.
*/
bool
-exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
+exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily)
{
ListCell *lc1;
@@ -2462,6 +2464,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
if (ec->ec_has_volatile)
continue;
+ /*
+ * It's okay to consider ec_broken ECs here. Brokenness just means we
+ * couldn't derive all the implied clauses we'd have liked to; it does
+ * not invalidate our knowledge that the members are equal.
+ */
+
+ /* Ignore if this EC doesn't use specified opfamily */
+ if (OidIsValid(opfamily) &&
+ !list_member_oid(ec->ec_opfamilies, opfamily))
+ continue;
+
foreach(lc2, ec->ec_members)
{
EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2);
@@ -2490,8 +2503,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2)
* (In principle there might be more than one matching eclass if multiple
* collations are involved, but since collation doesn't matter for equality,
* we ignore that fine point here.) This is much like exprs_known_equal,
- * except that we insist on the comparison operator matching the eclass, so
- * that the result is definite not approximate.
+ * except for the format of the input.
*
* On success, we also set fkinfo->eclass[colno] to the matching eclass,
* and set fkinfo->fk_eclass_member[colno] to the eclass member for the
@@ -2532,7 +2544,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root,
/* Never match to a volatile EC */
if (ec->ec_has_volatile)
continue;
- /* Note: it seems okay to match to "broken" eclasses here */
+ /* It's okay to consider "broken" ECs here, see exprs_known_equal */
foreach(lc2, ec->ec_members)
{
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 8b8def21ca..db66a3e189 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -68,6 +68,9 @@ query_planner(PlannerInfo *root,
root->join_rel_list = makeNode(RelInfoList);
root->join_rel_list->items = NIL;
root->join_rel_list->hash = NULL;
+ root->agg_info_list = makeNode(RelInfoList);
+ root->agg_info_list->items = NIL;
+ root->agg_info_list->hash = NULL;
root->join_rel_level = NULL;
root->join_cur_level = 0;
root->canon_pathkeys = NIL;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index c88da963db..e7f465ef7b 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -87,6 +87,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -640,6 +648,58 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel)
add_rel_info(root->join_rel_list, joinrel);
}
+/*
+ * add_grouped_rel
+ * Add grouped base or join relation to the list of grouped relations in
+ * the given PlannerInfo. Also add the corresponding RelAggInfo to
+ * root->agg_info_list.
+ */
+void
+add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel);
+ add_rel_info(root->agg_info_list, agg_info);
+}
+
+/*
+ * find_grouped_rel
+ * Returns grouped relation entry (base or join relation) corresponding to
+ * 'relids' or NULL if none exists.
+ *
+ * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one
+ * exists) will be returned in *agg_info_p.
+ */
+RelOptInfo *
+find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel;
+
+ rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG],
+ relids);
+ if (rel == NULL)
+ {
+ if (agg_info_p)
+ *agg_info_p = NULL;
+
+ return NULL;
+ }
+
+ /* also return the corresponding RelAggInfo, if asked */
+ if (agg_info_p)
+ {
+ RelAggInfo *agg_info;
+
+ agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids);
+
+ /* The relation exists, so the agg_info should be there too. */
+ Assert(agg_info != NULL);
+
+ *agg_info_p = agg_info;
+ }
+
+ return rel;
+}
+
/*
* set_foreign_rel_properties
* Set up foreign-join fields if outer and inner relation are foreign
@@ -2464,3 +2524,567 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Check if the given relation can produce grouped paths and return the
+ * information it'll need for it. The given relation is the non-grouped one
+ * which has the reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *grp_exprs_extra = NIL;
+ List *group_clauses_final;
+ int i;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if there
+ * is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+ Relids parent_relids = NULL;
+ AppendRelInfo **appinfos;
+ int nappinfos;
+ int cnt;
+
+ appinfos = find_appinfos_by_relids(root, rel->relids, &nappinfos);
+
+ for (cnt = 0; cnt < nappinfos; cnt++)
+ parent_relids = bms_add_member(parent_relids,
+ appinfos[cnt]->parent_relid);
+
+ Assert(!bms_is_empty(parent_relids));
+ rel_grouped = find_grouped_rel(root, parent_relids, &agg_info);
+
+ if (rel_grouped == NULL)
+ return NULL;
+
+ Assert(agg_info != NULL);
+
+ agg_info = (RelAggInfo *) adjust_appendrel_attrs(root,
+ (Node *) agg_info,
+ nappinfos,
+ appinfos);
+
+ pfree(appinfos);
+
+ agg_info->input_rows = rel->rows;
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ agg_info->input_rows, NULL, NULL);
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* initialize 'target' and 'agg_input' */
+ if (!init_grouping_targets(root, rel, target, agg_input, &grp_exprs_extra))
+ return NULL;
+
+ /* Eager aggregation makes no sense w/o grouping expressions */
+ if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0)
+ return NULL;
+
+ group_clauses_final = root->parse->groupClause;
+
+ /*
+ * If the aggregation target should have extra grouping expressions (in
+ * order to emit input vars for join conditions), add them now. This step
+ * includes assignment of tleSortGroupRef's which we can generate now.
+ */
+ if (list_length(grp_exprs_extra) > 0)
+ {
+ Index sortgroupref;
+
+ /*
+ * Make a copy of the group clauses as we'll need to add some more
+ * clauses.
+ */
+ group_clauses_final = list_copy(group_clauses_final);
+
+ /* find out the current max sortgroupref */
+ sortgroupref = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > sortgroupref)
+ sortgroupref = ref;
+ }
+
+ /*
+ * Generate the SortGroupClause's and add the expressions to the
+ * target.
+ */
+ foreach(lc, grp_exprs_extra)
+ {
+ Var *var = lfirst_node(Var, lc);
+ SortGroupClause *cl = makeNode(SortGroupClause);
+
+ /*
+ * Initialize the SortGroupClause.
+ *
+ * As the final aggregation will not use this grouping expression,
+ * we don't care whether sortop is < or >. The value of nulls_first
+ * should not matter for the same reason.
+ */
+ cl->tleSortGroupRef = ++sortgroupref;
+ get_sort_group_operators(var->vartype,
+ false, true, false,
+ &cl->sortop, &cl->eqop, NULL,
+ &cl->hashable);
+ group_clauses_final = lappend(group_clauses_final, cl);
+ add_column_to_pathtarget(target, (Expr *) var,
+ cl->tleSortGroupRef);
+
+ /*
+ * The aggregation input target must emit this var too.
+ */
+ add_column_to_pathtarget(agg_input, (Expr *) var,
+ cl->tleSortGroupRef);
+ }
+ }
+
+ /*
+ * Build a list of grouping expressions and a list of the corresponding
+ * SortGroupClauses.
+ */
+ i = 0;
+ result = makeNode(RelAggInfo);
+ foreach(lc, target->exprs)
+ {
+ Index sortgroupref = 0;
+ SortGroupClause *cl;
+ Expr *texpr;
+
+ texpr = (Expr *) lfirst(lc);
+
+ Assert(IsA(texpr, Var));
+
+ sortgroupref = target->sortgrouprefs[i++];
+ if (sortgroupref == 0)
+ continue;
+
+ /* find the SortGroupClause in group_clauses_final */
+ cl = get_sortgroupref_clause(sortgroupref, group_clauses_final);
+
+ /* do not add this SortGroupClause if it has already been added */
+ if (list_member(result->group_clauses, cl))
+ continue;
+
+ result->group_clauses = lappend(result->group_clauses, cl);
+ result->group_exprs = list_append_unique(result->group_exprs,
+ texpr);
+ }
+
+ /*
+ * Calculate pathkeys that represent this grouping requirements.
+ */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /*
+ * Add aggregates to the grouping target.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+
+ result->agg_exprs = lappend(result->agg_exprs, aggref);
+ }
+
+ /*
+ * Since neither target nor agg_input is supposed to be identical to the
+ * source reltarget, compute the width and cost again.
+ */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+
+ /*
+ * The number of aggregation input rows is simply the number of rows of the
+ * non-grouped relation, which should have been estimated by now.
+ */
+ result->input_rows = rel->rows;
+
+ /* Estimate the number of groups with equal grouped exprs. */
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ result->input_rows, NULL, NULL);
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+
+ /*
+ * The current implementation of eager aggregation cannot handle
+ * PlaceHolderVar (PHV).
+ *
+ * If we knew that the PHV should be evaluated in this target (and of
+ * course, if its expression matched some Aggref argument), we'd just let
+ * init_grouping_targets add that Aggref. On the other hand, if we knew
+ * that the PHV is evaluated below the current rel, we could ignore it
+ * because the referencing Aggref would take care of propagation of the
+ * value to upper joins.
+ *
+ * The problem is that the same PHV can be evaluated in the target of the
+ * current rel or in that of lower rel --- depending on the input paths.
+ * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B,
+ * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the
+ * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ if (IS_SIMPLE_REL(rel))
+ {
+ RangeTblEntry *rte = root->simple_rte_array[rel->relid];;
+
+ /*
+ * rtekind != RTE_RELATION case is not supported yet.
+ */
+ if (rte->rtekind != RTE_RELATION)
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate needs relations other than the current one.
+ *
+ * If the aggregate needs the current rel plus anything else, then the
+ * problem is that grouping of the current relation could make some
+ * input variables unavailable for the "higher aggregate", and it'd
+ * also decrease the number of input rows the "higher aggregate"
+ * receives.
+ *
+ * If the aggregate does not even need the current rel, then the
+ * current rel should be grouped because we do not support join of two
+ * grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize target for grouped paths (target) as well as a target for
+ * paths that generate input for the grouped paths (agg_input).
+ *
+ * group_exprs_extra_p receives a list of Var nodes for which we need to
+ * construct SortGroupClause. Those vars will then be used as additional
+ * grouping expressions, for the sake of join clauses.
+ *
+ * Return true iff the targets could be initialized.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_exprs_extra_p)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /* Get the sortgroupref if the expr can act as grouping expression. */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ /*
+ * If the target expression can be used as the grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+ }
+ else
+ {
+ if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The variable is needed for a join, however it's neither in
+ * the GROUP BY clause nor can it be derived from it using EC.
+ * (Otherwise it would have to be added to the targets above.)
+ * We need to construct special SortGroupClause for this
+ * variable.
+ *
+ * Note that its tleSortGroupRef needs to be unique within
+ * agg_input, so we need to postpone creation of the
+ * SortGroupClause's until we're done with the iteration of
+ * rel->reltarget->exprs. Also it makes sense for the caller to
+ * do some more check before it starts to create those
+ * SortGroupClause's.
+ */
+ *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * Another reason we might need this variable is that some
+ * aggregate pushed down to this relation references it. In
+ * such a case, add it to "agg_input", but not to "target".
+ * However, if the aggregate is not the only reason for the var
+ * to be in the target, some more checks need to be performed
+ * below.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The Var can be functionally dependent on another expression
+ * of the target, but we cannot check that until we've built
+ * all the expressions for the target.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+ }
+
+ /*
+ * Now we can check whether the expression is functionally dependent on
+ * another one.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ /*
+ * Check if the Var can be in the grouping key even though it's not
+ * mentioned by the GROUP BY clause (and could not be derived using
+ * ECs).
+ */
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The var shouldn't be actually used for grouping key evaluation
+ * (instead, the one this depends on will be), so sortgroupref
+ * should not be important.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * As long as the query is semantically correct, arriving here
+ * means that the var is referenced by a generic grouping
+ * expression but not referenced by any join.
+ *
+ * If the eager aggregation will support generic grouping
+ * expression in the future, create_rel_agg_info() will have to add
+ * this variable to "agg_input" target and also add the whole
+ * generic expression to "target".
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether given Var appears in Aggref(s) which we consider usable at
+ * relation / join level, and only in the Aggref(s).
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (bms_is_member(var->varno, ac_info->agg_eval_at))
+ break;;
+ }
+
+ /* No aggregate references the Var? */
+ if (lc == NULL)
+ return false;
+
+ /* Does the Var appear in the target outside aggregates? */
+ foreach(lc, root->processed_tlist)
+ {
+ TargetEntry *tle = lfirst_node(TargetEntry, lc);
+ List *vars;
+
+ if (IsA(tle->expr, Aggref))
+ continue;
+
+ vars = pull_var_clause((Node *) tle->expr,
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ return false;
+ }
+
+ list_free(vars);
+ }
+
+ /* The Var is in aggregate(s) and only there. */
+ return true;
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ *
+ * Consider pushing the aggregate avg(b.y) down to relation b for the following
+ * query:
+ *
+ * SELECT a.i, avg(b.y)
+ * FROM a JOIN b ON a.j = b.j
+ * GROUP BY a.i;
+ *
+ * Column b.j needs to be used as the grouping key because otherwise it cannot
+ * find its way to the input of the join expression.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when we are checking if the Var is needed by joins above, we
+ * want to exclude the situation where the Var is only needed in final
+ * output. So include "relation 0" here.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return sortgroupref if the given 'expr' can be used as a grouping
+ * expression in grouped paths for base or join relations, or 0 otherwise.
+ *
+ * Note that we also need to check if the 'expr' is known equal to other exprs
+ * due to equivalence relationships that can act as grouping expressions.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* The expression cannot be used as grouping key. */
+ return 0;
+}
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index cea777e9d4..d1365229f7 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3313,10 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos,
/*
* Drop known-equal vars, but only if they belong to different
- * relations (see comments for estimate_num_groups)
+ * relations (see comments for estimate_num_groups). We aren't too
+ * fussy about the semantics of "equal" here.
*/
if (vardata->rel != varinfo->rel &&
- exprs_known_equal(root, var, varinfo->var))
+ exprs_known_equal(root, var, varinfo->var, InvalidOid))
{
if (varinfo->ndistinct <= ndistinct)
{
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 69ed9eb1f6..3ef5195323 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -429,6 +429,12 @@ struct PlannerInfo
*/
RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);;
+ /*
+ * list of grouped relation RelAggInfos. One instance of RelAggInfo per
+ * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list.
+ */
+ RelInfoList *agg_info_list;
+
/* Result tlists chosen by grouping_planner for upper-stage processing */
struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore);
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index c43d97b48a..8d03ce2c57 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -310,6 +310,10 @@ extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids);
+extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel,
+ RelAggInfo *agg_info);
+extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids,
+ RelAggInfo **agg_info_p);
extern RelOptInfo *build_join_rel(PlannerInfo *root,
Relids joinrelids,
RelOptInfo *outer_rel,
@@ -344,4 +348,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
RelOptInfo *parent_joinrel, List *restrictlist,
SpecialJoinInfo *sjinfo);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 040a047b81..dcea10888b 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -160,7 +160,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root,
Relids join_relids,
Relids outer_relids,
RelOptInfo *inner_rel);
-extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2);
+extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2,
+ Oid opfamily);
extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root,
ForeignKeyOptInfo *fkinfo,
int colno);
--
2.31.0
[application/octet-stream] v1-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patch (7.8K, 6-v1-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patch)
download | inline diff:
From efad6c39e247078c6d3cdf3cf8561bd5d35004e6 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 11:12:18 +0800
Subject: [PATCH v1 2/9] Introduce RelAggInfo structure to store info for
grouped paths.
This commit introduces RelAggInfo structure to store information needed
to create grouped paths for base and join rels. It also revises the
RelInfoList related structures and functions so that they can be used
with RelAggInfos.
---
src/backend/optimizer/util/relnode.c | 66 +++++++++++++++++--------
src/include/nodes/pathnodes.h | 73 ++++++++++++++++++++++++++++
2 files changed, 118 insertions(+), 21 deletions(-)
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 9e25750acd..c88da963db 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -36,13 +36,13 @@
/*
- * An entry of a hash table that we use to make lookup for RelOptInfo
- * structures more efficient.
+ * An entry of a hash table that we use to make lookup for RelOptInfo or
+ * RelAggInfo structures more efficient.
*/
typedef struct RelInfoEntry
{
Relids relids; /* hash key --- MUST BE FIRST */
- RelOptInfo *rel;
+ void *data;
} RelInfoEntry;
static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel,
@@ -477,7 +477,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid)
/*
* build_rel_hash
- * Construct the auxiliary hash table for relations.
+ * Construct the auxiliary hash table for relation specific data.
*/
static void
build_rel_hash(RelInfoList *list)
@@ -497,19 +497,27 @@ build_rel_hash(RelInfoList *list)
&hash_ctl,
HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT);
- /* Insert all the already-existing relations */
+ /* Insert all the already-existing relation specific infos */
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
RelInfoEntry *hentry;
bool found;
+ Relids relids;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
+
+ if (IsA(item, RelOptInfo))
+ relids = ((RelOptInfo *) item)->relids;
+ else
+ relids = ((RelAggInfo *) item)->relids;
hentry = (RelInfoEntry *) hash_search(hashtab,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = item;
}
list->hash = hashtab;
@@ -517,9 +525,9 @@ build_rel_hash(RelInfoList *list)
/*
* find_rel_info
- * Find an RelOptInfo entry.
+ * Find an RelOptInfo or a RelAggInfo entry.
*/
-static RelOptInfo *
+static void *
find_rel_info(RelInfoList *list, Relids relids)
{
if (list == NULL)
@@ -550,7 +558,7 @@ find_rel_info(RelInfoList *list, Relids relids)
HASH_FIND,
NULL);
if (hentry)
- return hentry->rel;
+ return hentry->data;
}
else
{
@@ -558,10 +566,18 @@ find_rel_info(RelInfoList *list, Relids relids)
foreach(l, list->items)
{
- RelOptInfo *rel = (RelOptInfo *) lfirst(l);
+ void *item = lfirst(l);
+ Relids item_relids = NULL;
+
+ Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo));
- if (bms_equal(rel->relids, relids))
- return rel;
+ if (IsA(item, RelOptInfo))
+ item_relids = ((RelOptInfo *) item)->relids;
+ else if (IsA(item, RelAggInfo))
+ item_relids = ((RelAggInfo *) item)->relids;
+
+ if (bms_equal(item_relids, relids))
+ return item;
}
}
@@ -576,32 +592,40 @@ find_rel_info(RelInfoList *list, Relids relids)
RelOptInfo *
find_join_rel(PlannerInfo *root, Relids relids)
{
- return find_rel_info(root->join_rel_list, relids);
+ return (RelOptInfo *) find_rel_info(root->join_rel_list, relids);
}
/*
* add_rel_info
- * Add given relation to the given list. Also add it to the auxiliary
+ * Add relation specific info to a list, and also add it to the auxiliary
* hashtable if there is one.
*/
static void
-add_rel_info(RelInfoList *list, RelOptInfo *rel)
+add_rel_info(RelInfoList *list, void *data)
{
+ Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo));
+
/* GEQO requires us to append the new relation to the end of the list! */
- list->items = lappend(list->items, rel);
+ list->items = lappend(list->items, data);
/* store it into the auxiliary hashtable if there is one. */
if (list->hash)
{
+ Relids relids;
RelInfoEntry *hentry;
bool found;
+ if (IsA(data, RelOptInfo))
+ relids = ((RelOptInfo *) data)->relids;
+ else
+ relids = ((RelAggInfo *) data)->relids;
+
hentry = (RelInfoEntry *) hash_search(list->hash,
- &(rel->relids),
+ &relids,
HASH_ENTER,
&found);
Assert(!found);
- hentry->rel = rel;
+ hentry->data = data;
}
}
@@ -1496,7 +1520,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids)
/* If we already made this upperrel for the query, return it */
if (list)
{
- upperrel = find_rel_info(list, relids);
+ upperrel = (RelOptInfo *) find_rel_info(list, relids);
if (upperrel)
return upperrel;
}
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index be51e2c652..d67f725ad6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -1065,6 +1065,79 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes), just like with
+ * RelOptInfo.
+ *
+ * "target" will be used as pathtarget if partial aggregation is applied to
+ * base relation or join. The same target will also --- if the relation is a
+ * join --- be used to join grouped path to a non-grouped one. This target can
+ * contain plain-Var grouping expressions and Aggref nodes.
+ *
+ * Note: There's a convention that Aggref expressions are supposed to follow
+ * the other expressions of the target. Iterations of ->exprs may rely on this
+ * arrangement.
+ *
+ * "agg_input" contains Vars used either as grouping expressions or aggregate
+ * arguments. Paths providing the aggregation plan with input data should use
+ * this target. The only difference from reltarget of the non-grouped relation
+ * is that some items can have sortgroupref initialized.
+ *
+ * "input_rows" is the estimated number of input rows for AggPath. It's
+ * actually just a workspace for users of the structure, i.e. not initialized
+ * when instance of the structure is created.
+ *
+ * "grouped_rows" is the estimated number of result rows of the AggPath.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClause, the corresponding grouping expressions and PathKey
+ * respectively.
+ *
+ * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's
+ * paths.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * the same as in RelOptInfo; set of base + OJ relids (rangetable indexes)
+ */
+ Relids relids;
+
+ /*
+ * the targetlist for Paths scanning this grouped rel; list of Vars/Exprs,
+ * cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that generate input for the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of input tuples for the grouped paths */
+ Cardinality input_rows;
+
+ /* estimated number of result tuples of the grouped relation*/
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClause's */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* a list of Aggref nodes */
+ List *agg_exprs;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
--
2.31.0
[application/octet-stream] v1-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patch (14.3K, 7-v1-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patch)
download | inline diff:
From 9798f1f4e4d1e6aef6b712df452fc5f14e736292 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 18:40:46 +0800
Subject: [PATCH v1 3/9] Set up for eager aggregation by collecting needed
infos
This commit checks if eager aggregation is applicable, and if so, sets
up root->agg_clause_list and root->group_expr_list by collecting
suitable aggregate expressions and grouping expressions in the query.
---
src/backend/optimizer/path/allpaths.c | 1 +
src/backend/optimizer/plan/initsplan.c | 250 ++++++++++++++++++
src/backend/optimizer/plan/planmain.c | 8 +
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 41 +++
src/include/optimizer/paths.h | 1 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/sysviews.out | 3 +-
9 files changed, 315 insertions(+), 1 deletion(-)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 351bf2e9e4..9384c54ed9 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -80,6 +80,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = false;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index d4a9d77d7f..36c82bd696 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_class.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -81,6 +82,8 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -328,6 +331,253 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars,
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no GROUP BY clauses.
+ */
+ if (!root->parse->groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * SRF is not allowed in the aggregate argument and we don't even want it
+ * in the GROUP BY clause, so forbid it in general. It needs to be
+ * analyzed if evaluation of a GROUP BY clause containing SRF below the
+ * query targetlist would be correct. Currently it does not seem to be an
+ * important use case.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Collect aggregate expressions that appear in targetlist and having
+ * clauses.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * Create AggClauseInfo for each aggregate.
+ *
+ * If any aggregate is not suitable, set root->agg_clause_list to NIL and
+ * return.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * For now we don't try to support GROUPING() expressions.
+ */
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+
+ if (IsA(expr, GroupingFunc))
+ return;
+ }
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same way
+ * as those in the targetlist. Note that HAVING can contain Aggrefs but
+ * not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ AggClauseInfo *ac_info;
+
+ /*
+ * tlist_exprs may also contain Vars, but we only need Aggrefs.
+ */
+ if (IsA(expr, Var))
+ continue;
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ root->agg_clause_list =
+ list_append_unique(root->agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+}
+
+/*
+ * Create GroupExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, set root->group_expr_list to NIL
+ * and return.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->parse->groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+ Oid eq_op;
+ List *eq_opfamilies;
+ Oid btree_opfamily;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality of grouping keys
+ * per the equality operator implies bitwise equality. Otherwise, if
+ * we put keys of different byte images into the same group, we lose
+ * some information that may be needed to evaluate join clauses above
+ * the pushed-down aggregate node, or the WHERE clause.
+ *
+ * For example, the NUMERIC data type is not supported because values
+ * that fall into the same group according to the equality operator
+ * (e.g. 0 and 0.0) can have different scale.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ /*
+ * Get the operator in the btree's opfamily.
+ */
+ eq_op = get_opfamily_member(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEqualStrategyNumber);
+ if (!OidIsValid(eq_op))
+ return;
+ eq_opfamilies = get_mergejoin_opfamilies(eq_op);
+ if (!eq_opfamilies)
+ return;
+ btree_opfamily = linitial_oid(eq_opfamilies);
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily);
+ }
+
+ /*
+ * Construct GroupExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupExprInfo *ge_info;
+
+ ge_info = makeNode(GroupExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
/*****************************************************************************
*
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 3341e64d2b..8b8def21ca 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -78,6 +78,8 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -264,6 +266,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 527a2b2734..515e6d7737 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -984,6 +984,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ false,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c97f9a25f0..f841915482 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -401,6 +401,7 @@
#enable_sort = on
#enable_tidscan = on
#enable_group_by_reordering = on
+#enable_eager_aggregate = off
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d67f725ad6..69ed9eb1f6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -383,6 +383,12 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* List of GroupExprInfos */
+ List *group_expr_list;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -3193,6 +3199,41 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * The aggregate expressions that appear in targetlist and having clauses
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * The grouping expressions that appear in grouping clauses
+ */
+typedef struct GroupExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 0e8a9c94ba..040a047b81 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index f2e3fa4c2e..42e0f37859 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -73,6 +73,7 @@ extern void add_other_rels_to_query(PlannerInfo *root);
extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist);
extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
Relids where_needed);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
extern List *deconstruct_jointree(PlannerInfo *root);
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 9be7aca2b8..a83a41b0f8 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -113,6 +113,7 @@ select name, setting from pg_settings where name like 'enable%';
--------------------------------+---------
enable_async_append | on
enable_bitmapscan | on
+ enable_eager_aggregate | off
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -134,7 +135,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(23 rows)
+(24 rows)
-- There are always wait event descriptions for various types.
select type, count(*) > 0 as ok FROM pg_wait_events
--
2.31.0
[application/octet-stream] v1-0006-Build-grouped-relations-out-of-base-relations.patch (9.0K, 8-v1-0006-Build-grouped-relations-out-of-base-relations.patch)
download | inline diff:
From 4d5639555cb14fa74f20e61ba79c155ec9be8b23 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Wed, 28 Feb 2024 10:03:41 +0800
Subject: [PATCH v1 6/9] Build grouped relations out of base relations
This commit builds grouped relations for each base relation if possible,
and generates aggregation paths for the grouped base relations.
---
src/backend/optimizer/path/allpaths.c | 91 +++++++++++++++++++++++
src/backend/optimizer/util/relnode.c | 101 ++++++++++++++++++++++++++
src/include/optimizer/pathnode.h | 4 +
3 files changed, 196 insertions(+)
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index f47ad04846..ea2341d110 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -96,6 +96,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -120,6 +121,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -188,6 +190,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped base relations for each base rel if possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -329,6 +336,59 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each "plain" base relation build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+
+ /*
+ * Ignore RTEs that are not simple rels. Note that we need to consider
+ * "other rels" here.
+ */
+ if (!IS_SIMPLE_REL(rel))
+ continue;
+
+ rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info);
+ if (rel_grouped)
+ {
+ /* Make the grouped relation available for joining. */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -565,6 +625,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1292,6 +1361,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /* Add paths to the grouped base relation if one exists. */
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+}
+
/*
* add_paths_to_append_rel
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index e7f465ef7b..83cdbb38bc 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,7 @@
#include <limits.h>
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +28,15 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
#include "rewrite/rewriteManip.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/*
@@ -411,6 +415,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo for a grouped base relation out of an existing
+ * non-grouped base relation.
+ *
+ * On success, the new RelOptInfo is returned and the corresponding RelAggInfo
+ * is stored in *agg_info_p.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p)
+{
+ RelOptInfo *rel_plain;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping expressions,
+ * otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ rel_plain = root->simple_rel_array[relid];
+ Assert(rel_plain != NULL);
+ Assert(IS_SIMPLE_REL(rel_plain));
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel_plain))
+ return NULL;
+
+ /*
+ * Prepare the information we need to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel_plain);
+ if (agg_info == NULL)
+ return NULL;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, rel_plain);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /* return the RelAggInfo structure */
+ *agg_info_p = agg_info;
+
+ return rel_grouped;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying a plain relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain)
+{
+ RelOptInfo *rel_grouped;
+
+ rel_grouped = makeNode(RelOptInfo);
+ memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ rel_grouped->pathlist = NIL;
+ rel_grouped->ppilist = NIL;
+ rel_grouped->partial_pathlist = NIL;
+ rel_grouped->cheapest_startup_path = NULL;
+ rel_grouped->cheapest_total_path = NULL;
+ rel_grouped->cheapest_unique_path = NULL;
+ rel_grouped->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ rel_grouped->part_scheme = NULL;
+ rel_grouped->nparts = -1;
+ rel_grouped->boundinfo = NULL;
+ rel_grouped->partbounds_merged = false;
+ rel_grouped->partition_qual = NIL;
+ rel_grouped->part_rels = NULL;
+ rel_grouped->live_parts = NULL;
+ rel_grouped->all_partrels = NULL;
+ rel_grouped->partexprs = NULL;
+ rel_grouped->nullable_partexprs = NULL;
+ rel_grouped->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ rel_grouped->rows = 0;
+
+ return rel_grouped;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 8d03ce2c57..6b856a5e77 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -306,6 +306,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid,
+ RelAggInfo **agg_info_p);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
--
2.31.0
[application/octet-stream] v1-0007-Build-grouped-relations-out-of-join-relations.patch (19.3K, 9-v1-0007-Build-grouped-relations-out-of-join-relations.patch)
download | inline diff:
From 429cab42ee94a88eef79dfb3575ded35b8056a1c Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 13:33:09 +0800
Subject: [PATCH v1 7/9] Build grouped relations out of join relations
This commit builds grouped relations for each just-processed join
relation if possible, and generates aggregation paths for the grouped
join relations.
If we are joining rel1 and rel2, the aggregation paths for the grouped
join relation are generated by 1) joining the grouped paths of rel1 to
the plain paths of rel2, or joining the grouped paths of rel2 to the
plain paths of rel1, and 2) adding sorted and hashed partial aggregation
paths on top of paths of the plain join rel except for the topmost join
rel.
This commit also makes the grouped relation for the topmost join rel act
as the upper rel representing the result of partial aggregation, so that
we can add the final aggregation on top of that.
This commit also makes eager aggregation work for partitionwise join and
for geqo.
Starting from this commit, you should be able to see plans with eager
aggregation.
---
src/backend/optimizer/geqo/geqo_eval.c | 84 +++++++++++++----
src/backend/optimizer/path/allpaths.c | 48 ++++++++++
src/backend/optimizer/path/joinrels.c | 115 ++++++++++++++++++++++++
src/backend/optimizer/plan/planner.c | 35 ++++++--
src/backend/optimizer/util/appendinfo.c | 64 +++++++++++++
5 files changed, 320 insertions(+), 26 deletions(-)
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index 1141156899..278857d767 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -60,8 +60,12 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
MemoryContext oldcxt;
RelOptInfo *joinrel;
Cost fitness;
- int savelength;
- struct HTAB *savehash;
+ int savelength_join_rel;
+ struct HTAB *savehash_join_rel;
+ int savelength_grouped_rel;
+ struct HTAB *savehash_grouped_rel;
+ int savelength_grouped_info;
+ struct HTAB *savehash_grouped_info;
/*
* Create a private memory context that will hold all temp storage
@@ -78,25 +82,38 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
oldcxt = MemoryContextSwitchTo(mycontext);
/*
- * gimme_tree will add entries to root->join_rel_list, which may or may
- * not already contain some entries. The newly added entries will be
- * recycled by the MemoryContextDelete below, so we must ensure that the
- * list is restored to its former state before exiting. We can do this by
- * truncating the list to its original length. NOTE this assumes that any
- * added entries are appended at the end!
+ * gimme_tree will add entries to root->join_rel_list, root->agg_info_list
+ * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not
+ * already contain some entries. The newly added entries will be recycled
+ * by the MemoryContextDelete below, so we must ensure that each list of
+ * the RelInfoList structures is restored to its former state before
+ * exiting. We can do this by truncating each list to its original length.
+ * NOTE this assumes that any added entries are appended at the end!
*
- * We also must take care not to mess up the outer join_rel_list->hash, if
- * there is one. We can do this by just temporarily setting the link to
- * NULL. (If we are dealing with enough join rels, which we very likely
- * are, a new hash table will get built and used locally.)
+ * We also must take care not to mess up the outer hash tables of the
+ * RelInfoList structures, if any. We can do this by just temporarily
+ * setting each link to NULL. (If we are dealing with enough join rels,
+ * which we very likely are, new hash tables will get built and used
+ * locally.)
*
* join_rel_level[] shouldn't be in use, so just Assert it isn't.
*/
- savelength = list_length(root->join_rel_list->items);
- savehash = root->join_rel_list->hash;
+ savelength_join_rel = list_length(root->join_rel_list->items);
+ savehash_join_rel = root->join_rel_list->hash;
+
+ savelength_grouped_rel =
+ list_length(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items);
+ savehash_grouped_rel =
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash;
+
+ savelength_grouped_info = list_length(root->agg_info_list->items);
+ savehash_grouped_info = root->agg_info_list->hash;
+
Assert(root->join_rel_level == NULL);
root->join_rel_list->hash = NULL;
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL;
+ root->agg_info_list->hash = NULL;
/* construct the best path for the given combination of relations */
joinrel = gimme_tree(root, tour, num_gene);
@@ -118,12 +135,22 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene)
fitness = DBL_MAX;
/*
- * Restore join_rel_list to its former state, and put back original
- * hashtable if any.
+ * Restore each of the list in join_rel_list, agg_info_list and
+ * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back
+ * original hashtable if any.
*/
root->join_rel_list->items = list_truncate(root->join_rel_list->items,
- savelength);
- root->join_rel_list->hash = savehash;
+ savelength_join_rel);
+ root->join_rel_list->hash = savehash_join_rel;
+
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items =
+ list_truncate(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items,
+ savelength_grouped_rel);
+ root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = savehash_grouped_rel;
+
+ root->agg_info_list->items = list_truncate(root->agg_info_list->items,
+ savelength_grouped_info);
+ root->agg_info_list->hash = savehash_grouped_info;
/* release all the memory acquired within gimme_tree */
MemoryContextSwitchTo(oldcxt);
@@ -279,6 +306,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, joinrel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, joinrel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index ea2341d110..440a5daec7 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -3864,6 +3864,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3884,6 +3888,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of the
+ * paths of this rel. After that, we're done creating paths for
+ * the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4752,6 +4777,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info;
+
+ rel_grouped = find_grouped_rel(root, child_rel->relids,
+ &agg_info);
+ if (rel_grouped)
+ {
+ generate_grouped_paths(root, rel_grouped, child_rel,
+ agg_info);
+ set_cheapest(rel_grouped);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 4750579b0a..a9ef081597 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,11 +16,13 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "partitioning/partbounds.h"
#include "utils/memutils.h"
+#include "utils/selfuncs.h"
static void make_rels_by_clause_joins(PlannerInfo *root,
@@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -753,6 +758,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -864,6 +873,107 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation out of 'joinrel' if eager aggregation is
+ * possible and the 'joinrel' can produce grouped paths.
+ *
+ * We also generate partial aggregation paths for the grouped relation by
+ * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by
+ * joining the grouped paths of 'rel2' to the plain paths of 'rel1'.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ Relids joinrelids;
+ RelOptInfo *rel_grouped;
+ RelAggInfo *agg_info = NULL;
+ RelOptInfo *rel1_grouped;
+ RelOptInfo *rel2_grouped;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ joinrelids = bms_union(rel1->relids, rel2->relids);
+ rel_grouped = find_grouped_rel(root, joinrelids, &agg_info);
+
+ /*
+ * Construct a new RelOptInfo for the grouped join relation if there is no
+ * existing one.
+ */
+ if (rel_grouped == NULL)
+ {
+ /*
+ * Prepare the information we need to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /* build a grouped relation out of the plain relation */
+ rel_grouped = build_grouped_rel(root, joinrel);
+ rel_grouped->reltarget = agg_info->target;
+ rel_grouped->rows = agg_info->grouped_rows;
+
+ /*
+ * Make the grouped relation available for further joining or for
+ * acting as the upper rel representing the result of partial
+ * aggregation.
+ */
+ add_grouped_rel(root, rel_grouped, agg_info);
+ }
+
+ Assert(agg_info != NULL);
+
+ /* retrieve the grouped relations for the two input rels */
+ rel1_grouped = find_grouped_rel(root, rel1->relids, NULL);
+ rel2_grouped = find_grouped_rel(root, rel2->relids, NULL);
+
+ /* we should not see dummy grouped relation */
+ Assert(rel1_grouped == NULL || !IS_DUMMY_REL(rel1_grouped));
+ Assert(rel2_grouped == NULL || !IS_DUMMY_REL(rel2_grouped));
+
+ /* Nothing to do if there's no grouped relation. */
+ if (rel1_grouped == NULL &&
+ rel2_grouped == NULL)
+ return;
+
+ /*
+ * Join of two grouped relations is currently not supported. In such a
+ * case, grouping of one side would change the occurrence of the other
+ * side's aggregate transient states on the input of the final aggregation.
+ * This can be handled by adjusting the transient states, but it's not
+ * worth the effort for now.
+ */
+ if (rel1_grouped != NULL &&
+ rel2_grouped != NULL)
+ return;
+
+ /* generate partial aggregation paths for the grouped relation */
+ if (rel1_grouped != NULL)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped,
+ sjinfo, restrictlist);
+ }
+ else if (rel2_grouped != NULL)
+ {
+ set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped,
+ sjinfo, restrictlist);
+ populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped,
+ sjinfo, restrictlist);
+ }
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1653,6 +1763,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index be4e182869..f8f2a09f1b 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3946,10 +3946,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
+
+ /*
+ * Now choose the best path(s) for partially_grouped_rel.
+ *
+ * Note that the non-partial paths can come either from the Gather above or
+ * from eager aggregation.
+ */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
set_cheapest(partially_grouped_rel);
- }
/*
* Estimate number of groups.
@@ -7043,6 +7049,13 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * The partially_grouped_rel could have been already created due to eager
+ * aggregation.
+ */
+ partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL);
+ Assert(enable_eager_aggregate || partially_grouped_rel == NULL);
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7066,19 +7079,25 @@ create_partial_grouping_paths(PlannerInfo *root,
* If we can't partially aggregate partial paths, and we can't partially
* aggregate non-partial paths, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
+ *
+ * Note that the partially_grouped_rel could have been already created and
+ * populated with appropriate paths by eager aggregation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
!force_rel_creation)
- return NULL;
+ return partially_grouped_rel;
/*
* Build a new upper relation to represent the result of partially
- * aggregating the rows from the input relation.
- */
- partially_grouped_rel = fetch_upper_rel(root,
- UPPERREL_PARTIAL_GROUP_AGG,
- grouped_rel->relids);
+ * aggregating the rows from the input relation. The relation may already
+ * exist due to eager aggregation, in which case we don't need to create
+ * it.
+ */
+ if (partially_grouped_rel == NULL)
+ partially_grouped_rel = fetch_upper_rel(root,
+ UPPERREL_PARTIAL_GROUP_AGG,
+ grouped_rel->relids);
partially_grouped_rel->consider_parallel =
grouped_rel->consider_parallel;
partially_grouped_rel->reloptkind = grouped_rel->reloptkind;
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 51fdeace7d..7016473047 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -495,6 +495,70 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ context->nappinfos,
+ context->appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ newinfo->agg_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
--
2.31.0
[application/octet-stream] v1-0009-Add-README.patch (4.8K, 10-v1-0009-Add-README.patch)
download | inline diff:
From 2037ffb3a2636203d4105c2ee0e47b9aa67041d7 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 13:41:36 +0800
Subject: [PATCH v1 9/9] Add README
---
src/backend/optimizer/README | 88 ++++++++++++++++++++++++++++++++++++
1 file changed, 88 insertions(+)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 2ab4f3dbf3..fa5cdc135f 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1497,3 +1497,91 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-------------------
+
+The obvious way to evaluate aggregates is to evaluate the FROM clause of the
+SQL query (this is what query_planner does) and use the resulting paths as the
+input of Agg node. However, if the groups are large enough, it may be more
+efficient to apply the partial aggregation to the output of base relation
+scan, and finalize it when we have all relations of the query joined:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y)
+ FROM a JOIN b ON a.i = b.j
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Seq Scan on b
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Thus the join above the partial aggregate node receives fewer input rows, and
+so the number of outer-to-inner pairs of tuples to be checked can be
+significantly lower, which can in turn lead to considerably lower join cost.
+
+Note that the GROUP BY expression might not be useful for the partial
+aggregate. In the example above, the aggregate avg(b.y) references table "b",
+but the GROUP BY expression mentions "a". However, the equivalence class {a.i,
+b.j} allows us to use the b.j column as a grouping key for the partial
+aggregation of the "b" table. The equivalence class mechanism is suitable
+because it's designed to derive join clauses, and at the same time the join
+clauses determine the choice of grouping columns of the partial aggregate: the
+only way for the partial aggregate to provide upper join(s) with input values
+is to have the join input expression(s) in the grouping key; besides grouping
+columns, the partial aggregate can only produce the transient states of the
+aggregate functions, but aggregate functions cannot be referenced by the JOIN
+clauses.
+
+Regarding correctness, join node considers the output of the partial aggregate
+to be equivalent to the output of a plain (non-aggregated) relation scan. That
+is, a group (i.e. a row of the partial aggregate output) matches the other
+side of the join if and only if each row of the non-aggregate relation
+does. In other words, all rows belonging to the same group have the same value
+of the join columns (As mentioned above, a join cannot reference other output
+expressions of the partial aggregate than the grouping expressions.).
+
+However, there's a restriction from the aggregate's perspective: the aggregate
+cannot be pushed down if any column referenced by either grouping expression
+or aggregate function can be set to NULL by an outer join above the relation
+to which we want to apply the partiall aggregation. The point is that those
+NULL values would not appear on the input of the pushed-down, so it could
+either put the rows into groups in a different way than the aggregate at the
+top of the plan, or it could compute wrong values of the aggregate functions.
+
+Besides base relation, the aggregation can also be pushed down to join:
+
+ EXPLAIN (COSTS OFF)
+ SELECT a.i, avg(b.y + c.z)
+ FROM a JOIN b ON a.i = b.j
+ JOIN c ON b.j = c.i
+ GROUP BY a.i;
+
+ Finalize HashAggregate
+ Group Key: a.i
+ -> Nested Loop
+ -> Partial HashAggregate
+ Group Key: b.j
+ -> Hash Join
+ Hash Cond: (b.j = c.i)
+ -> Seq Scan on b
+ -> Hash
+ -> Seq Scan on c
+ -> Index Only Scan using a_pkey on a
+ Index Cond: (i = b.j)
+
+Whether the Agg node is created out of base relation or out of join, it's
+added to a separate RelOptInfo that we call "grouped relation". Grouped
+relation can be joined to a non-grouped relation, which results in a grouped
+relation too. Join of two grouped relations does not seem to be very useful
+and is currently not supported.
+
+If query_planner produces a grouped relation that contains valid paths, these
+are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further
+processing of these paths then does not differ from processing of other
+partially grouped paths.
--
2.31.0
[application/octet-stream] v1-0008-Add-test-cases.patch (66.8K, 11-v1-0008-Add-test-cases.patch)
download | inline diff:
From 09fed8131d6b2def5e5d76c7b73e86a9ae997c7a Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 23 Feb 2024 13:41:22 +0800
Subject: [PATCH v1 8/9] Add test cases
---
src/test/regress/expected/eager_aggregate.out | 1270 +++++++++++++++++
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 205 +++
3 files changed, 1476 insertions(+), 1 deletion(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 0000000000..2d7dec8a5d
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1270 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(15 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 6 | 501
+ 7 | 502
+ 3 | 498
+ 4 | 499
+ 9 | 504
+ 5 | 500
+ 8 | 503
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(22 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 6 | 507
+ 7 | 509
+ 3 | 501
+ 4 | 503
+ 9 | 513
+ 5 | 505
+ 8 | 511
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t3.c, t2.b
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t3.c, t2.b
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ QUERY PLAN
+------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.a, avg(t3.c)
+ Group Key: t3.a
+ -> Sort
+ Output: t3.a, (PARTIAL avg(t3.c))
+ Sort Key: t3.a
+ -> Hash Left Join
+ Output: t3.a, (PARTIAL avg(t3.c))
+ Hash Cond: (t3.b = t1.b)
+ -> Partial HashAggregate
+ Output: t3.a, t3.b, PARTIAL avg(t3.c)
+ Group Key: t3.a, t3.b
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(18 rows)
+
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ a | avg
+---+-----
+ 0 | 505
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ QUERY PLAN
+------------------------------------------------------
+ HashAggregate
+ Output: t3.a, avg(t3.c)
+ Group Key: t3.a
+ -> Hash Right Join
+ Output: t3.a, t3.c
+ Hash Cond: (t3.b = t1.b)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(12 rows)
+
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+ a | avg
+---+-----
+ 8 | 503
+ |
+ 9 | 504
+ 7 | 502
+ 1 | 496
+ 5 | 500
+ 4 | 499
+ 2 | 497
+ 6 | 501
+ 3 | 498
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Gather
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(46 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+ x | sum | count
+----+------+-------
+ 6 | 1100 | 100
+ 0 | 500 | 100
+ 12 | 700 | 100
+ 18 | 1300 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(46 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+ y | sum | count
+----+------+-------
+ 6 | 1100 | 100
+ 0 | 500 | 100
+ 18 | 1300 | 100
+ 12 | 700 | 100
+ 24 | 900 | 100
+(5 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '10'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ Hash Cond: (t2_3.y = t1_3.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_3
+ Output: t2_3.x, t2_3.y
+ -> Hash
+ Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x))
+ -> Partial HashAggregate
+ Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x)
+ Group Key: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(41 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+ x | sum | count
+----+------+-------
+ 4 | 1200 | 50
+ 14 | 1200 | 50
+ 18 | 900 | 50
+ 2 | 600 | 50
+ 12 | 600 | 50
+ 8 | 900 | 50
+(6 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(67 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+ x | sum
+----+-------
+ 4 | 18000
+ 2 | 14000
+ 8 | 26000
+ 6 | 22000
+ 0 | 10000
+ 16 | 22000
+ 10 | 10000
+ 14 | 18000
+ 12 | 14000
+ 18 | 26000
+ 26 | 22000
+ 28 | 26000
+ 22 | 14000
+ 20 | 10000
+ 24 | 18000
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Sort
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Sort Key: t3_1.y, t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Sort
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Sort Key: t3_2.y, t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_2
+ Output: t1_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y)))
+ Hash Cond: (t2_3.x = t1_3.x)
+ -> Partial GroupAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y))
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Sort
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Sort Key: t3_3.y, t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash
+ Output: t1_3.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_3
+ Output: t1_3.x
+(73 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+ y | sum
+----+-------
+ 0 | 7500
+ 2 | 13500
+ 4 | 19500
+ 6 | 25500
+ 8 | 31500
+ 10 | 22500
+ 12 | 28500
+ 14 | 34500
+ 16 | 40500
+ 18 | 46500
+(10 rows)
+
+RESET enable_hashagg;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(76 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+ x | sum | count
+----+-------+-------
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 7 | 8092 | 1156
+ 1 | 1156 | 1156
+ 5 | 5780 | 1156
+ 4 | 4624 | 1156
+ 2 | 2312 | 1156
+ 0 | 0 | 1089
+ 6 | 6936 | 1156
+ 3 | 3468 | 1156
+ 11 | 11979 | 1089
+ 13 | 14157 | 1089
+ 10 | 11560 | 1156
+ 14 | 15246 | 1089
+ 12 | 13068 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 16 | 17424 | 1089
+ 15 | 16335 | 1089
+ 19 | 20691 | 1089
+ 24 | 26136 | 1089
+ 21 | 22869 | 1089
+ 23 | 25047 | 1089
+ 22 | 23958 | 1089
+ 20 | 21780 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 25 | 27225 | 1089
+ 29 | 31581 | 1089
+ 28 | 30492 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash Join
+ Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.y, t1_5.x
+ -> Hash
+ Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*)
+ Group Key: t2_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+(64 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+ y | sum | count
+----+-------+-------
+ 29 | 31581 | 1089
+ 4 | 4624 | 1156
+ 0 | 0 | 1089
+ 10 | 11560 | 1156
+ 9 | 10404 | 1156
+ 7 | 8092 | 1156
+ 15 | 16335 | 1089
+ 6 | 6936 | 1156
+ 26 | 28314 | 1089
+ 12 | 13068 | 1089
+ 24 | 26136 | 1089
+ 19 | 20691 | 1089
+ 25 | 27225 | 1089
+ 21 | 22869 | 1089
+ 14 | 15246 | 1089
+ 3 | 3468 | 1156
+ 17 | 18513 | 1089
+ 28 | 30492 | 1089
+ 22 | 23958 | 1089
+ 20 | 21780 | 1089
+ 13 | 14157 | 1089
+ 1 | 1156 | 1156
+ 5 | 5780 | 1156
+ 18 | 19602 | 1089
+ 2 | 2312 | 1156
+ 16 | 17424 | 1089
+ 27 | 29403 | 1089
+ 23 | 25047 | 1089
+ 11 | 11979 | 1089
+ 8 | 9248 | 1156
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------
+ Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(111 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+ x | sum | count
+----+---------+-------
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 7 | 550256 | 39304
+ 1 | 78608 | 39304
+ 5 | 393040 | 39304
+ 4 | 314432 | 39304
+ 2 | 157216 | 39304
+ 0 | 0 | 35937
+ 6 | 471648 | 39304
+ 3 | 235824 | 39304
+ 11 | 790614 | 35937
+ 13 | 934362 | 35937
+ 10 | 786080 | 39304
+ 14 | 1006236 | 35937
+ 12 | 862488 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 16 | 1149984 | 35937
+ 15 | 1078110 | 35937
+ 19 | 1365606 | 35937
+ 24 | 1724976 | 35937
+ 21 | 1509354 | 35937
+ 23 | 1653102 | 35937
+ 22 | 1581228 | 35937
+ 20 | 1437480 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 25 | 1796850 | 35937
+ 29 | 2084346 | 35937
+ 28 | 2012472 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t3_1.y, t2_1.x, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t3_2.y, t2_2.x, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t3_3.y, t2_3.x, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t3_4.y, t2_4.x, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4
+ Output: t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ Hash Cond: (t1_5.x = t2_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5
+ Output: t1_5.x
+ -> Hash
+ Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*)
+ Group Key: t3_5.y, t2_5.x, t3_5.x
+ -> Hash Join
+ Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x
+ Hash Cond: (t2_5.x = t3_5.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5
+ Output: t2_5.y, t2_5.x
+ -> Hash
+ Output: t3_5.y, t3_5.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5
+ Output: t3_5.y, t3_5.x
+(99 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+ y | sum | count
+----+---------+-------
+ 29 | 2084346 | 35937
+ 4 | 314432 | 39304
+ 0 | 0 | 35937
+ 10 | 786080 | 39304
+ 9 | 707472 | 39304
+ 7 | 550256 | 39304
+ 15 | 1078110 | 35937
+ 6 | 471648 | 39304
+ 26 | 1868724 | 35937
+ 12 | 862488 | 35937
+ 24 | 1724976 | 35937
+ 19 | 1365606 | 35937
+ 25 | 1796850 | 35937
+ 21 | 1509354 | 35937
+ 14 | 1006236 | 35937
+ 3 | 235824 | 39304
+ 17 | 1221858 | 35937
+ 28 | 2012472 | 35937
+ 22 | 1581228 | 35937
+ 20 | 1437480 | 35937
+ 13 | 934362 | 35937
+ 1 | 78608 | 39304
+ 5 | 393040 | 39304
+ 18 | 1293732 | 35937
+ 2 | 157216 | 39304
+ 16 | 1149984 | 35937
+ 27 | 1940598 | 35937
+ 23 | 1653102 | 35937
+ 11 | 790614 | 35937
+ 8 | 628864 | 39304
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 1d8a414eea..250a9dba21 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 0000000000..aba2c41557
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,205 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+
+-- Produce results with hash aggregation
+SET enable_hashagg TO on;
+SET enable_sort TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+SET enable_sort TO on;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a;
+
+SET enable_hashagg TO default;
+SET enable_sort TO default;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+SELECT t3.a, avg(t3.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t3 t3 ON t1.b = t3.b GROUP BY t3.a;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30);
+INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i;
+INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y;
+RESET enable_hashagg;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
--
2.31.0
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-06-13 07:41 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
2 siblings, 2 replies; 55+ messages in thread
From: Richard Guo @ 2025-06-13 07:41 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
I've switched back to this thread and will begin by working through
the key concerns that were previously raised.
The first concern is the lack of a proof demonstrating the correctness
of this transformation. To address this, I plan to include a detailed
proof in the README, along the lines of the following.
====== proof start ======
To prove that the transformation is correct, we partition the tables
in the FROM clause into two groups: those that contain at least one
aggregation column, and those that do not contain any aggregation
columns. Each group can be treated as a single relation formed by the
Cartesian product of the tables within that group. Therefore, without
loss of generality, we can assume that the FROM clause contains
exactly two relations, R1 and R2, where R1 represents the relation
containing all aggregation columns, and R2 represents the relation
without any aggregation columns.
Let the query be of the form:
SELECT G, AGG(A)
FROM R1 JOIN R2 ON J
GROUP BY G;
where G is the set of grouping keys that may include columns from R1
and/or R2; AGG(A) is an aggregate function over columns A from R1; J
is the join condition between R1 and R2.
The transformation of eager aggregation is:
GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
=
GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1)
JOIN R2 ON J)
This equivalence holds under the following conditions:
1) AGG is decomposable, meaning that it can be computed in two stages:
a partial aggregation followed by a final aggregation;
2) The set G1 used in the pre-aggregation of R1 includes:
* all columns from R1 that are part of the grouping keys G, and
* all columns from R1 that appear in the join condition J.
3) The grouping operator for any column in G1 must be compatible with
the operator used for that column in the join condition J.
Since G1 includes all columns from R1 that appear in either the
grouping keys G or the join condition J, all rows within each partial
group have identical values for both the grouping keys and the
join-relevant columns from R1, assuming compatible operators are used.
As a result, the rows within a partial group are indistinguishable in
terms of their contribution to the aggregation and their behavior in
the join. This ensures that all rows in the same partial group share
the same "destiny": they either all match or all fail to match a given
row in R2. Because the aggregate function AGG is decomposable,
aggregating the partial results after the join yields the same final
result as aggregating after the full join, thereby preserving query
semantics.
Q.E.D.
The second concern is that a RelOptInfo representing a grouped
relation may include paths that produce different row sets due to
partial aggregation being applied at different join levels. This
potentially violates a fundamental assumption in the planner.
Additionally, the patch currently performs an exhaustive search by
exploring partial aggregation at every possible join level, leading to
excessive planning effort, which may not be justified by the
cost-benefit ratio.
To address these concerns, I'm thinking that maybe we can adopt a
strategy where partial aggregation is only pushed to the lowest
possible level in the join tree that is deemed useful. In other
words, if we can build a grouped path like "AGG(B) JOIN A" -- and
AGG(B) yields a significant reduction in row count -- we skip
exploring alternatives like "AGG(A JOIN B)".
This is somewhat analogous to how we handle qual clauses: we only push
a qual clause down to the lowest scan or join level that includes all
the relations it references -- following the "filter early, join late"
principle. For example, if predicate Pb only references B, we only
consider "A JOIN sigma[Pb](B)" and skip "sigma[Pb](A JOIN B)". (Note
that if Pb involves costly functions and the join is highly selective,
we may want to apply the predicate after the join.)
This ensures that all grouped paths for the same grouped relation
produce the same set of rows (e.g., consider "A JOIN AGG(B) JOIN C"
vs. "AGG(B) JOIN C JOIN A"). As a result, we avoid the complexity of
comparing costs between different grouped paths of the same grouped
relation, and also eliminate the need for special handling of row
estimates on join paths. It also significantly reduces planning
effort.
While this approach may miss potentially more efficient plans where
applying partial aggregation at a higher join level would yield better
performance, it strikes a practical balance: we can still find plans
that outperform those without eager aggregation, without incurring
excessive planning overhead. As discussed earlier, it's uncommon in
practice to encounter multiple joins that dramatically inflate row
counts. So in most cases, pushing partial aggregation to the lowest
level where it offers a significant row count reduction tends to be
the most efficient strategy.
I think this heuristic serves as a good starting point, and we can
look into extending it with more advanced strategies as the feature
evolves.
Any thoughts?
Thanks
Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-06-26 02:01 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
1 sibling, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-06-26 02:01 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Fri, Jun 13, 2025 at 4:41 PM Richard Guo <[email protected]> wrote:
> I've switched back to this thread and will begin by working through
> the key concerns that were previously raised.
>
> The first concern is the lack of a proof demonstrating the correctness
> of this transformation. To address this, I plan to include a detailed
> proof in the README, along the lines of the following.
> The second concern is that a RelOptInfo representing a grouped
> relation may include paths that produce different row sets due to
> partial aggregation being applied at different join levels. This
> potentially violates a fundamental assumption in the planner.
>
> Additionally, the patch currently performs an exhaustive search by
> exploring partial aggregation at every possible join level, leading to
> excessive planning effort, which may not be justified by the
> cost-benefit ratio.
>
> To address these concerns, I'm thinking that maybe we can adopt a
> strategy where partial aggregation is only pushed to the lowest
> possible level in the join tree that is deemed useful. In other
> words, if we can build a grouped path like "AGG(B) JOIN A" -- and
> AGG(B) yields a significant reduction in row count -- we skip
> exploring alternatives like "AGG(A JOIN B)".
Here is the patch based on the proposed ideas. It includes the proof
of correctness in the README and implements the strategy of pushing
partial aggregation only to the lowest applicable join level where it
is deemed useful. This is done by introducing a "Relids apply_at"
field to track that level and ensuring that partial aggregation is
applied only at the recorded "apply_at" level.
Additionally, this patch changes how grouped relations are stored.
Since each grouped relation represents a partially aggregated version
of a non-grouped relation, we now associate each grouped relation with
the RelOptInfo of the corresponding non-grouped relation. This
eliminates the need for a dedicated list of all grouped relations and
avoids list searches when retrieving a grouped relation.
It also addresses other previously raised concerns, such as the
potential memory blowout risks with large partial-aggregation values,
and includes improvements to comments and the commit message.
Another change is that this feature is now enabled by default.
Thanks
Richard
Attachments:
[application/octet-stream] v17-0001-Implement-Eager-Aggregation.patch (165.3K, 2-v17-0001-Implement-Eager-Aggregation.patch)
download | inline diff:
From fcdd75d824bc9ee65078ad2dc7337cca22eccf50 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v17] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo, Antonin Houska
Reviewed-by: Robert Haas, Jian He, Tender Wang, Paul George, Tom Lane
Reviewed-by: Tomas Vondra, Andy Fan, Ashutosh Bapat
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 15 +
src/backend/optimizer/README | 89 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 443 ++++++
src/backend/optimizer/path/joinrels.c | 193 +++
src/backend/optimizer/plan/initsplan.c | 313 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 59 +
src/backend/optimizer/util/pathnode.c | 12 +-
src/backend/optimizer/util/relnode.c | 636 ++++++++
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 130 ++
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 5 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/eager_aggregate.out | 1334 +++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 194 +++
src/tools/pgindent/typedefs.list | 3 +
23 files changed, 3588 insertions(+), 63 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 2185b42bb4f..b9f767df05d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3692,30 +3692,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 59a0874528a..780b4a9fed1 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5470,6 +5470,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 9c724ccfabf..48a575c5bda 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1501,3 +1501,92 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, we partition the tables
+in the FROM clause into two groups: those that contain at least one
+aggregation column, and those that do not contain any aggregation
+columns. Each group can be treated as a single relation formed by the
+Cartesian product of the tables within that group. Therefore, without
+loss of generality, we can assume that the FROM clause contains
+exactly two relations, R1 and R2, where R1 represents the relation
+containing all aggregation columns, and R2 represents the relation
+without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+One restriction is that we cannot push partial aggregation down to a
+relation that is in the nullable side of an outer join, because the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..e75bb41b58d 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,6 +79,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +333,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +602,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1357,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3417,319 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ path->pathkeys,
+ &presorted_keys);
+
+ if (!is_sorted)
+ {
+ /*
+ * Try at least sorting the cheapest path and also try
+ * incrementally sorting any path which is partially sorted
+ * already (no need to deal with paths which have presorted
+ * keys when incremental sort is disabled unless it's the
+ * cheapest input path).
+ */
+ if (input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3889,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3913,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4803,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index aad41b94009..477b0bc3b84 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -35,6 +36,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -763,6 +767,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -874,6 +882,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 01804b085b3..7fa1e5099b1 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -81,6 +82,9 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool has_internal_aggtranstype(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +632,315 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate uses INTERNAL transition
+ * type.
+ *
+ * Although INTERNAL is marked as pass-by-value, it usually points to a
+ * large internal data structure (like those used by string_agg or
+ * array_agg). These transition states can grow large and their size is
+ * hard to estimate. Applying eager aggregation in such cases risks high
+ * memory usage since partial aggregation results might be stored in join
+ * hash tables or materialized nodes.
+ */
+ if (has_internal_aggtranstype(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * has_internal_aggtranstype
+ * Checks if any aggregate uses INTERNAL transition type.
+ */
+static bool
+has_internal_aggtranstype(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, tce->btree_opf);
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 549aedcfa99..6289902fc93 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -231,7 +231,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3982,9 +3981,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4066,23 +4063,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7027,16 +7017,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7149,7 +7165,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7167,7 +7183,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7175,7 +7191,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7217,19 +7233,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7269,6 +7283,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7279,6 +7294,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7300,11 +7324,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7529,6 +7555,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8092,13 +8163,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..11c0eb0d180 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,65 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ nappinfos, appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index e0192d4a491..26127eb07d1 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2790,8 +2790,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3046,8 +3045,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3094,8 +3092,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3256,8 +3253,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..c4054b5d03f 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,7 +89,22 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
+/*
+ * Minimum average group size required to consider applying eager aggregation.
+ *
+ * This helps avoid the overhead of eager aggregation when it does not offer
+ * significant row count reduction.
+ */
+#define EAGER_AGG_MIN_GROUP_SIZE 20.0
/*
* setup_simple_rel_arrays
@@ -276,6 +297,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -406,6 +429,104 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_unique_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -755,6 +876,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -939,6 +1062,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2518,3 +2643,514 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than EAGER_AGG_MIN_GROUP_SIZE.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= EAGER_AGG_MIN_GROUP_SIZE;
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+ result->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * average group size is no less than EAGER_AGG_MIN_GROUP_SIZE.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= EAGER_AGG_MIN_GROUP_SIZE;
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f04bfedb2fd..5a6a3b7406e 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -949,6 +949,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 341f88adc87..00eaf4869e0 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6567759595d..1b03b5f03cf 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -394,6 +394,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1022,6 +1031,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1095,6 +1112,75 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * EAGER_AGG_MIN_GROUP_SIZE, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3274,6 +3360,50 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the btree opfamily used for equality comparison. This information is
+ * necessary to reproduce correct grouping semantics at different levels of the
+ * join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 60dcdb77e41..01a3532dc2e 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -314,6 +314,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -353,4 +357,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 8410531f2d6..b62f22237b7 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..f02ff0b30a3
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1334 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index a424be2a6bf..929cab14c47 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..5da8749a6cb
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,194 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 32d6e718adc..61b7e6ea049 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1111,6 +1112,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2464,6 +2466,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.43.0
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-07-24 03:21 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-07-24 03:21 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Thu, Jun 26, 2025 at 11:01 AM Richard Guo <[email protected]> wrote:
> Here is the patch based on the proposed ideas. It includes the proof
> of correctness in the README and implements the strategy of pushing
> partial aggregation only to the lowest applicable join level where it
> is deemed useful. This is done by introducing a "Relids apply_at"
> field to track that level and ensuring that partial aggregation is
> applied only at the recorded "apply_at" level.
>
> Additionally, this patch changes how grouped relations are stored.
> Since each grouped relation represents a partially aggregated version
> of a non-grouped relation, we now associate each grouped relation with
> the RelOptInfo of the corresponding non-grouped relation. This
> eliminates the need for a dedicated list of all grouped relations and
> avoids list searches when retrieving a grouped relation.
>
> It also addresses other previously raised concerns, such as the
> potential memory blowout risks with large partial-aggregation values,
> and includes improvements to comments and the commit message.
>
> Another change is that this feature is now enabled by default.
This patch no longer applies; here's a rebased version. Nothing
essential has changed.
Thanks
Richard
Attachments:
[application/octet-stream] v18-0001-Implement-Eager-Aggregation.patch (165.5K, 2-v18-0001-Implement-Eager-Aggregation.patch)
download | inline diff:
From 23ab3a8c476e130a93b843c6afcba149641169fb Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v18] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo, Antonin Houska
Reviewed-by: Robert Haas, Jian He, Tender Wang, Paul George, Tom Lane
Reviewed-by: Tomas Vondra, Andy Fan, Ashutosh Bapat
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 15 +
src/backend/optimizer/README | 89 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 452 ++++++
src/backend/optimizer/path/joinrels.c | 193 +++
src/backend/optimizer/plan/initsplan.c | 313 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 59 +
src/backend/optimizer/util/pathnode.c | 12 +-
src/backend/optimizer/util/relnode.c | 636 ++++++++
src/backend/utils/misc/guc_tables.c | 10 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/nodes/pathnodes.h | 130 ++
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 5 +
src/include/optimizer/planmain.h | 1 +
src/test/regress/expected/eager_aggregate.out | 1334 +++++++++++++++++
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 194 +++
src/tools/pgindent/typedefs.list | 3 +
23 files changed, 3597 insertions(+), 63 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 4b6e49a5d95..8dea3dee667 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3713,30 +3713,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 20ccb2d6b54..395bca6cf95 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5474,6 +5474,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 9c724ccfabf..48a575c5bda 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1501,3 +1501,92 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, we partition the tables
+in the FROM clause into two groups: those that contain at least one
+aggregation column, and those that do not contain any aggregation
+columns. Each group can be treated as a single relation formed by the
+Cartesian product of the tables within that group. Therefore, without
+loss of generality, we can assume that the FROM clause contains
+exactly two relations, R1 and R2, where R1 represents the relation
+containing all aggregation columns, and R2 represents the relation
+without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+One restriction is that we cannot push partial aggregation down to a
+relation that is in the nullable side of an outer join, because the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..ac922dbf56a 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,6 +79,7 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +333,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +602,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1357,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3417,328 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3898,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3922,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4812,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index aad41b94009..477b0bc3b84 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -35,6 +36,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -763,6 +767,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -874,6 +882,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..3fbccc67190 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -81,6 +82,9 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool has_internal_aggtranstype(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +632,315 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate uses INTERNAL transition
+ * type.
+ *
+ * Although INTERNAL is marked as pass-by-value, it usually points to a
+ * large internal data structure (like those used by string_agg or
+ * array_agg). These transition states can grow large and their size is
+ * hard to estimate. Applying eager aggregation in such cases risks high
+ * memory usage since partial aggregation results might be stored in join
+ * hash tables or materialized nodes.
+ */
+ if (has_internal_aggtranstype(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * has_internal_aggtranstype
+ * Checks if any aggregate uses INTERNAL transition type.
+ */
+static bool
+has_internal_aggtranstype(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, tce->btree_opf);
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c989e72cac5..6e1d01adbfa 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -231,7 +231,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3970,9 +3969,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4054,23 +4051,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7015,16 +7005,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7137,7 +7153,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7155,7 +7171,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7163,7 +7179,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7205,19 +7221,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7257,6 +7271,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7267,6 +7282,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7288,11 +7312,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7517,6 +7543,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8080,13 +8151,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..11c0eb0d180 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,65 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ nappinfos, appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 9cc602788ea..71d1096012c 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2813,8 +2813,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3069,8 +3068,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3117,8 +3115,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3279,8 +3276,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..c4054b5d03f 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,7 +89,22 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
+/*
+ * Minimum average group size required to consider applying eager aggregation.
+ *
+ * This helps avoid the overhead of eager aggregation when it does not offer
+ * significant row count reduction.
+ */
+#define EAGER_AGG_MIN_GROUP_SIZE 20.0
/*
* setup_simple_rel_arrays
@@ -276,6 +297,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -406,6 +429,104 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_unique_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -755,6 +876,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -939,6 +1062,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2518,3 +2643,514 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than EAGER_AGG_MIN_GROUP_SIZE.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= EAGER_AGG_MIN_GROUP_SIZE;
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+ result->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * average group size is no less than EAGER_AGG_MIN_GROUP_SIZE.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= EAGER_AGG_MIN_GROUP_SIZE;
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index d14b1678e7f..5ef8b824a7b 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -949,6 +949,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a9d8293474a..0eb755d61da 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e5dd15098f6..c9df12aa38e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -397,6 +397,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1024,6 +1033,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1097,6 +1114,75 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * EAGER_AGG_MIN_GROUP_SIZE, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3276,6 +3362,50 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the btree opfamily used for equality comparison. This information is
+ * necessary to reproduce correct grouping semantics at different levels of the
+ * join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 60dcdb77e41..01a3532dc2e 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -314,6 +314,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -353,4 +357,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 8410531f2d6..b62f22237b7 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,6 +21,7 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
@@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..f02ff0b30a3
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1334 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..5da8749a6cb
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,194 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a8656419cb6..37053d9d769 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2471,6 +2473,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.43.0
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-08-06 07:52 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 3 replies; 55+ messages in thread
From: Richard Guo @ 2025-08-06 07:52 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Thu, Jul 24, 2025 at 12:21 PM Richard Guo <[email protected]> wrote:
> This patch no longer applies; here's a rebased version. Nothing
> essential has changed.
Based on some off-list testing by Matheus (CC'ed), several TPC-DS
queries that used to apply eager aggregation no longer do, which
suggests that the v18 patch is too strict about when eager aggregation
can be used.
I looked into query 4 and query 11, and found two reasons why they no
longer apply eager aggregation with v18.
* The has_internal_aggtranstype() check.
To avoid potential memory blowout risks from large partial aggregation
values, v18 avoids applying eager aggregation if any aggregate uses an
INTERNAL transition type, as this typically indicates a large internal
data structure (as in string_agg or array_agg). However, this also
excludes aggregates like avg(numeric) and sum(numeric), which are
actually safe to use with eager aggregation.
What we really want to exclude are aggregate functions that can
produce large transition values by accumulating or concatenating input
rows. So I'm wondering if we could instead check the transfn_oid
directly and explicitly exclude only F_ARRAY_AGG_TRANSFN and
F_STRING_AGG_TRANSFN. We don't need to worry about json_agg,
jsonb_agg, or xmlagg, since they don't support partial aggregation
anyway.
* The EAGER_AGG_MIN_GROUP_SIZE threshold
This threshold defines the minimum average group size required to
consider applying eager aggregation. It was previously set to 2, but
in v18 it was increased to 20 to be cautious about planning overhead.
This change was a snap decision though, without any profiling or data
to back it.
Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum
needed to consider eager aggregation for them. The resulting plans
show nice performance improvements without any measurable increase in
planning time. So, I'm inclined to lower the threshold to 10 for now.
(Wondering whether we should make this threshold a GUC, so users can
adjust it based on their needs.)
With these two changes, here are the planning and execution time for
queries 4 and 11 (scale factor 1) on my snail-paced machine, with and
without eager aggregation.
query 4:
-- without eager aggregation
Planning Time: 6.765 ms
Execution Time: 34941.713 ms
-- with eager aggregation
Planning Time: 6.674 ms
Execution Time: 13994.183 ms
query 11:
-- without eager aggregation
Planning Time: 3.757 ms
Execution Time: 20888.076 ms
-- with eager aggregation
Planning Time: 3.747 ms
Execution Time: 7449.522 ms
Any comments on these two changes?
Thanks
Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-08-06 13:44 Matheus Alcantara <[email protected]>
parent: Richard Guo <[email protected]>
2 siblings, 1 reply; 55+ messages in thread
From: Matheus Alcantara @ 2025-08-06 13:44 UTC (permalink / raw)
To: Richard Guo <[email protected]>; Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Wed Aug 6, 2025 at 4:52 AM -03, Richard Guo wrote:
> On Thu, Jul 24, 2025 at 12:21 PM Richard Guo <[email protected]> wrote:
>> This patch no longer applies; here's a rebased version. Nothing
>> essential has changed.
>
> Based on some off-list testing by Matheus (CC'ed), several TPC-DS
> queries that used to apply eager aggregation no longer do, which
> suggests that the v18 patch is too strict about when eager aggregation
> can be used.
>
> I looked into query 4 and query 11, and found two reasons why they no
> longer apply eager aggregation with v18.
>
> * The has_internal_aggtranstype() check.
>
> To avoid potential memory blowout risks from large partial aggregation
> values, v18 avoids applying eager aggregation if any aggregate uses an
> INTERNAL transition type, as this typically indicates a large internal
> data structure (as in string_agg or array_agg). However, this also
> excludes aggregates like avg(numeric) and sum(numeric), which are
> actually safe to use with eager aggregation.
>
> What we really want to exclude are aggregate functions that can
> produce large transition values by accumulating or concatenating input
> rows. So I'm wondering if we could instead check the transfn_oid
> directly and explicitly exclude only F_ARRAY_AGG_TRANSFN and
> F_STRING_AGG_TRANSFN. We don't need to worry about json_agg,
> jsonb_agg, or xmlagg, since they don't support partial aggregation
> anyway.
>
I think it makes sense to me. I just wondering if we should follow an
"allow" or "don't-allow" strategy. I mean, instead of a list aggregate
functions that are not allowed we could list functions that are actually
allowed to use eager aggregation, so in this case we ensure that for the
functions that are enabled the eager aggregation can work properly.
> * The EAGER_AGG_MIN_GROUP_SIZE threshold
>
> This threshold defines the minimum average group size required to
> consider applying eager aggregation. It was previously set to 2, but
> in v18 it was increased to 20 to be cautious about planning overhead.
> This change was a snap decision though, without any profiling or data
> to back it.
>
> Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum
> needed to consider eager aggregation for them. The resulting plans
> show nice performance improvements without any measurable increase in
> planning time. So, I'm inclined to lower the threshold to 10 for now.
> (Wondering whether we should make this threshold a GUC, so users can
> adjust it based on their needs.)
>
Having a GUC may sound like a good idea to me TBH. This threshold may
vary from workload to workload (?).
>
> With these two changes, here are the planning and execution time for
> queries 4 and 11 (scale factor 1) on my snail-paced machine, with and
> without eager aggregation.
>
> query 4:
> -- without eager aggregation
> Planning Time: 6.765 ms
> Execution Time: 34941.713 ms
> -- with eager aggregation
> Planning Time: 6.674 ms
> Execution Time: 13994.183 ms
>
> query 11:
> -- without eager aggregation
> Planning Time: 3.757 ms
> Execution Time: 20888.076 ms
> -- with eager aggregation
> Planning Time: 3.747 ms
> Execution Time: 7449.522 ms
>
> Any comments on these two changes?
>
It sounds like a good way to go for me, looking forward to the next
patch version to perform some other tests.
Thanks
--
Matheus Alcantara
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-08-09 01:32 Richard Guo <[email protected]>
parent: Matheus Alcantara <[email protected]>
0 siblings, 2 replies; 55+ messages in thread
From: Richard Guo @ 2025-08-09 01:32 UTC (permalink / raw)
To: Matheus Alcantara <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Wed, Aug 6, 2025 at 10:44 PM Matheus Alcantara
<[email protected]> wrote:
> On Wed Aug 6, 2025 at 4:52 AM -03, Richard Guo wrote:
> > * The has_internal_aggtranstype() check.
> >
> > To avoid potential memory blowout risks from large partial aggregation
> > values, v18 avoids applying eager aggregation if any aggregate uses an
> > INTERNAL transition type, as this typically indicates a large internal
> > data structure (as in string_agg or array_agg). However, this also
> > excludes aggregates like avg(numeric) and sum(numeric), which are
> > actually safe to use with eager aggregation.
> >
> > What we really want to exclude are aggregate functions that can
> > produce large transition values by accumulating or concatenating input
> > rows. So I'm wondering if we could instead check the transfn_oid
> > directly and explicitly exclude only F_ARRAY_AGG_TRANSFN and
> > F_STRING_AGG_TRANSFN. We don't need to worry about json_agg,
> > jsonb_agg, or xmlagg, since they don't support partial aggregation
> > anyway.
> I think it makes sense to me. I just wondering if we should follow an
> "allow" or "don't-allow" strategy. I mean, instead of a list aggregate
> functions that are not allowed we could list functions that are actually
> allowed to use eager aggregation, so in this case we ensure that for the
> functions that are enabled the eager aggregation can work properly.
I ended up still checking for INTERNAL transition types, but
explicitly excluded aggregates that use F_NUMERIC_AVG_ACCUM transition
function, assuming that avg(numeric) and sum(numeric) are safe in this
context. This might still be overly strict, but I prefer to be on the
safe side for now.
> > * The EAGER_AGG_MIN_GROUP_SIZE threshold
> >
> > This threshold defines the minimum average group size required to
> > consider applying eager aggregation. It was previously set to 2, but
> > in v18 it was increased to 20 to be cautious about planning overhead.
> > This change was a snap decision though, without any profiling or data
> > to back it.
> >
> > Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum
> > needed to consider eager aggregation for them. The resulting plans
> > show nice performance improvements without any measurable increase in
> > planning time. So, I'm inclined to lower the threshold to 10 for now.
> > (Wondering whether we should make this threshold a GUC, so users can
> > adjust it based on their needs.)
> Having a GUC may sound like a good idea to me TBH. This threshold may
> vary from workload to workload (?).
I've made this threshold a GUC, with a default value of 8 (further
benchmark testing showed that a value of 10 is still too strict for
TPC-DS query 4).
> > Any comments on these two changes?
> It sounds like a good way to go for me, looking forward to the next
> patch version to perform some other tests.
OK. Here it is.
Thanks
Richard
Attachments:
[application/octet-stream] v19-0001-Implement-Eager-Aggregation.patch (174.0K, 2-v19-0001-Implement-Eager-Aggregation.patch)
download | inline diff:
From 22999025da5f400b4b780df13dce008665c5c372 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v19] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo, Antonin Houska
Reviewed-by: Robert Haas, Jian He, Tender Wang, Paul George, Tom Lane
Reviewed-by: Tomas Vondra, Andy Fan, Ashutosh Bapat
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 31 +
src/backend/optimizer/README | 89 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 453 ++++++
src/backend/optimizer/path/joinrels.c | 193 +++
src/backend/optimizer/plan/initsplan.c | 322 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 59 +
src/backend/optimizer/util/pathnode.c | 12 +-
src/backend/optimizer/util/relnode.c | 629 ++++++++
src/backend/utils/misc/guc_tables.c | 21 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/include/nodes/pathnodes.h | 130 ++
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 6 +
src/include/optimizer/planmain.h | 1 +
.../regress/expected/collate.icu.utf8.out | 32 +-
src/test/regress/expected/eager_aggregate.out | 1334 +++++++++++++++++
src/test/regress/expected/join.out | 12 +-
.../regress/expected/partition_aggregate.out | 2 +
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 194 +++
src/test/regress/sql/partition_aggregate.sql | 2 +
src/tools/pgindent/typedefs.list | 3 +
27 files changed, 3658 insertions(+), 82 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index a434eb1395e..e05dcb44947 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3713,30 +3713,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 20ccb2d6b54..5400bd8f18f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5474,6 +5474,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
@@ -6094,6 +6109,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-min-eager-agg-group-size" xreflabel="min_eager_agg_group_size">
+ <term><varname>min_eager_agg_group_size</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>min_eager_agg_group_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the minimum average group size required to consider applying
+ eager aggregation. This helps avoid the overhead of eager
+ aggregation when it does not offer significant row count reduction.
+ The default is <literal>8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-jit-above-cost" xreflabel="jit_above_cost">
<term><varname>jit_above_cost</varname> (<type>floating point</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 9c724ccfabf..48a575c5bda 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1501,3 +1501,92 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, we partition the tables
+in the FROM clause into two groups: those that contain at least one
+aggregation column, and those that do not contain any aggregation
+columns. Each group can be treated as a single relation formed by the
+Cartesian product of the tables within that group. Therefore, without
+loss of generality, we can assume that the FROM clause contains
+exactly two relations, R1 and R2, where R1 represents the relation
+containing all aggregation columns, and R2 represents the relation
+without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+One restriction is that we cannot push partial aggregation down to a
+relation that is in the nullable side of an outer join, because the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..7b349a4570e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,7 +79,9 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
+double min_eager_agg_group_size;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +94,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +119,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +188,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +334,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +603,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1358,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3418,328 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3899,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3923,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4813,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index aad41b94009..477b0bc3b84 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -35,6 +36,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -763,6 +767,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -874,6 +882,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..9cc8c558ccf 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -31,6 +32,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/analyze.h"
#include "rewrite/rewriteManip.h"
+#include "utils/fmgroids.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/typcache.h"
@@ -81,6 +83,9 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool is_partial_agg_memory_risky(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +633,323 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate poses a risk of
+ * excessive memory usage during partial aggregation.
+ */
+ if (is_partial_agg_memory_risky(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * is_partial_agg_memory_risky
+ * Checks if any aggregate poses a risk of excessive memory usage during
+ * partial aggregation.
+ *
+ * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
+ * is marked as pass-by-value, it usually points to a large internal data
+ * structure (like those used by string_agg or array_agg). These transition
+ * states can grow large and their size is hard to estimate. Applying eager
+ * aggregation in such cases risks high memory usage since partial aggregation
+ * results might be stored in join hash tables or materialized nodes.
+ *
+ * We explicitly exclude aggregates with F_NUMERIC_AVG_ACCUM transition
+ * function from this check, based on the assumption that avg(numeric) and
+ * sum(numeric) are safe in this context.
+ */
+static bool
+is_partial_agg_memory_risky(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM)
+ continue;
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, tce->btree_opf);
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index d59d6e4c6a0..d361319d0b5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -231,7 +231,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -3971,9 +3970,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4055,23 +4052,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7016,16 +7006,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7138,7 +7154,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7156,7 +7172,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7164,7 +7180,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7206,19 +7222,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7258,6 +7272,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7268,6 +7283,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7289,11 +7313,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7518,6 +7544,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8081,13 +8152,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..11c0eb0d180 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,65 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ nappinfos, appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index a4c5867cdcb..5a2e723bc29 100644
--- a/src/backend/optimizer/util/pathnode.c
+++ b/src/backend/optimizer/util/pathnode.c
@@ -2818,8 +2818,7 @@ create_projection_path(PlannerInfo *root,
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe &&
@@ -3074,8 +3073,7 @@ create_incremental_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3122,8 +3120,7 @@ create_sort_path(PlannerInfo *root,
pathnode->path.parent = rel;
/* Sort doesn't project, so use source path's pathtarget */
pathnode->path.pathtarget = subpath->pathtarget;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
@@ -3284,8 +3281,7 @@ create_agg_path(PlannerInfo *root,
pathnode->path.pathtype = T_Agg;
pathnode->path.parent = rel;
pathnode->path.pathtarget = target;
- /* For now, assume we are above any joins, so no parameterization */
- pathnode->path.param_info = NULL;
+ pathnode->path.param_info = subpath->param_info;
pathnode->path.parallel_aware = false;
pathnode->path.parallel_safe = rel->consider_parallel &&
subpath->parallel_safe;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index ff507331a06..bd28687dc81 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,6 +89,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -276,6 +290,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -406,6 +422,104 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_unique_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -755,6 +869,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -939,6 +1055,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2518,3 +2636,514 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than min_eager_agg_group_size.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= min_eager_agg_group_size;
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+ result->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * average group size is no less than min_eager_agg_group_size.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= min_eager_agg_group_size;
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index d14b1678e7f..cdf8da02960 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -949,6 +949,16 @@ struct config_bool ConfigureNamesBool[] =
false,
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
@@ -3980,6 +3990,17 @@ struct config_real ConfigureNamesReal[] =
NULL, NULL, NULL
},
+ {
+ {"min_eager_agg_group_size", PGC_USERSET, QUERY_TUNING_COST,
+ gettext_noop("Sets the minimum average group size required to consider applying eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &min_eager_agg_group_size,
+ 8.0, 0.0, DBL_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"cursor_tuple_fraction", PGC_USERSET, QUERY_TUNING_OTHER,
gettext_noop("Sets the planner's estimate of the fraction of "
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a9d8293474a..e3cdfe11992 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
@@ -441,6 +442,7 @@
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB
+#min_eager_agg_group_size = 8.0
#jit_above_cost = 100000 # perform JIT compilation if available
# and query more expensive than this;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ad2726f026f..a6175cbecaf 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -397,6 +397,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1024,6 +1033,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1097,6 +1114,75 @@ typedef struct RelOptInfo
((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \
(rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs)
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * min_eager_agg_group_size, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3278,6 +3364,50 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the btree opfamily used for equality comparison. This information is
+ * necessary to reproduce correct grouping semantics at different levels of the
+ * join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 58936e963cb..cbdbc4978f6 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -314,6 +314,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -353,4 +357,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index 8410531f2d6..9f6bad1faca 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,7 +21,9 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
+extern PGDLLIMPORT double min_eager_agg_group_size;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
extern PGDLLIMPORT bool enable_group_by_reordering;
@@ -57,6 +59,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 69805d4b9ec..ef79d6f1ded 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2437,11 +2437,11 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SET enable_partitionwise_join TO false;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2449,10 +2449,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2464,11 +2466,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2476,10 +2478,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..f02ff0b30a3
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1334 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 4d5d35d0727..b764284d9c0 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2837,20 +2837,22 @@ select x.thousand, x.twothousand, count(*)
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
group by x.thousand, x.twothousand
order by x.thousand desc, x.twothousand;
- QUERY PLAN
-----------------------------------------------------------------------------------
- GroupAggregate
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
Group Key: x.thousand, x.twothousand
-> Incremental Sort
Sort Key: x.thousand DESC, x.twothousand
Presorted Key: x.thousand
-> Merge Join
Merge Cond: (y.thousand = x.thousand)
- -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
+ -> Partial GroupAggregate
+ Group Key: y.thousand
+ -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
-> Sort
Sort Key: x.thousand DESC
-> Seq Scan on tenk1 x
-(11 rows)
+(13 rows)
reset enable_hashagg;
reset enable_nestloop;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index 5f2c0cf5786..1f56f55155b 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -13,6 +13,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..5da8749a6cb
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,194 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee244..124cc260461 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -14,6 +14,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index e6f2e93b2d6..052e6b7b920 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2472,6 +2474,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.43.0
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-08-14 19:22 Matheus Alcantara <[email protected]>
parent: Richard Guo <[email protected]>
1 sibling, 1 reply; 55+ messages in thread
From: Matheus Alcantara @ 2025-08-14 19:22 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On 08/08/25 22:32, Richard Guo wrote:
>> It sounds like a good way to go for me, looking forward to the next
>> patch version to perform some other tests.
>
> OK. Here it is.
>
Thanks! I can confirm now that I can see the eager aggregate in action
in some of these queries that I've tested on the TPC-DS benchmark.
I few questions regarding the new version:
I've noticed that when a query has a WHERE clause filtering columns from
the same relation being aggregated using "=" operator the Partial and
Finalize aggregation nodes are not present on explain results even if
setup_eager_aggregation() returns true on all if statements and also
RelAggInfo->agg_useful is true. For example, consider this query that is
used on eager aggregation paper that use some tables from TPC-H
benchmark:
tpch=# show enable_eager_aggregate ;
enable_eager_aggregate
------------------------
on
(1 row)
tpch=# set max_parallel_workers_per_gather to 0;
SET
tpch=# EXPLAIN(COSTS OFF) SELECT O_CLERK,
SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS LOSS
FROM LINEITEM
JOIN ORDERS ON L_ORDERKEY = O_ORDERKEY
WHERE L_RETURNFLAG = 'R'
GROUP BY O_CLERK;
QUERY PLAN
--------------------------------------------------------------
HashAggregate
Group Key: orders.o_clerk
-> Hash Join
Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
-> Seq Scan on lineitem
Filter: (l_returnflag = 'R'::bpchar)
-> Hash
-> Seq Scan on orders
(8 rows)
Debugging this query shows that all if conditions on
setup_eager_aggregation() returns false and create_agg_clause_infos()
and create_grouping_expr_infos() are called. The RelAggInfo->agg_useful
is also being set to true so I would expect to see Finalize and Partial
agg nodes, is this correct or am I missing something here?
Removing the WHERE clause I can see the Finalize and Partial agg nodes:
tpch=# EXPLAIN(COSTS OFF) SELECT O_CLERK,
SUM(L_EXTENDEDPRICE * (1 - L_DISCOUNT)) AS LOSS
FROM LINEITEM
JOIN ORDERS ON L_ORDERKEY = O_ORDERKEY
GROUP BY O_CLERK;
QUERY PLAN
----------------------------------------------------------------------
Finalize HashAggregate
Group Key: orders.o_clerk
-> Merge Join
Merge Cond: (lineitem.l_orderkey = orders.o_orderkey)
-> Partial GroupAggregate
Group Key: lineitem.l_orderkey
-> Index Scan using idx_lineitem_orderkey on lineitem
-> Index Scan using orders_pkey on orders
(8 rows)
This can also be reproduced with an addition of a WHERE clause on some
tests on eager_aggregate.sql:
postgres=# EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.a, avg(t2.c)
FROM eager_agg_t1 t1
JOIN eager_agg_t2 t2
ON t1.b = t2.b
WHERE t2.c = 5
GROUP BY t1.a
ORDER BY t1.a;
QUERY PLAN
------------------------------------------------------------------
GroupAggregate
Output: t1.a, avg(t2.c)
Group Key: t1.a
-> Sort
Output: t1.a, t2.c
Sort Key: t1.a
-> Hash Join
Output: t1.a, t2.c
Hash Cond: (t1.b = t2.b)
-> Seq Scan on public.eager_agg_t1 t1
Output: t1.a, t1.b, t1.c
-> Hash
Output: t2.c, t2.b
-> Seq Scan on public.eager_agg_t2 t2
Output: t2.c, t2.b
Filter: (t2.c = '5'::double precision)
(16 rows)
Note that if I use ">" operator for example, this doesn't happen:
SELECT t1.a, avg(t2.c)
FROM eager_agg_t1 t1
JOIN eager_agg_t2 t2
ON t1.b = t2.b
WHERE t2.c > 5
GROUP BY t1.a
ORDER BY t1.a;
QUERY PLAN
------------------------------------------------------------------------
Finalize GroupAggregate
Output: t1.a, avg(t2.c)
Group Key: t1.a
-> Sort
Output: t1.a, (PARTIAL avg(t2.c))
Sort Key: t1.a
-> Hash Join
Output: t1.a, (PARTIAL avg(t2.c))
Hash Cond: (t1.b = t2.b)
-> Seq Scan on public.eager_agg_t1 t1
Output: t1.a, t1.b, t1.c
-> Hash
Output: t2.b, (PARTIAL avg(t2.c))
-> Partial HashAggregate
Output: t2.b, PARTIAL avg(t2.c)
Group Key: t2.b
-> Seq Scan on public.eager_agg_t2 t2
Output: t2.a, t2.b, t2.c
Filter: (t2.c > '5'::double precision)
(19 rows)
Is this behavior correct? If it's correct, would be possible to check
this limitation on setup_eager_aggregation() and maybe skip all the
other work?
--
Matheus Alcantara
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-08-15 01:41 Richard Guo <[email protected]>
parent: Matheus Alcantara <[email protected]>
0 siblings, 0 replies; 55+ messages in thread
From: Richard Guo @ 2025-08-15 01:41 UTC (permalink / raw)
To: Matheus Alcantara <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Fri, Aug 15, 2025 at 4:22 AM Matheus Alcantara
<[email protected]> wrote:
> Debugging this query shows that all if conditions on
> setup_eager_aggregation() returns false and create_agg_clause_infos()
> and create_grouping_expr_infos() are called. The RelAggInfo->agg_useful
> is also being set to true so I would expect to see Finalize and Partial
> agg nodes, is this correct or am I missing something here?
Well, just because eager aggregation *can* be applied does not mean
that it *will* be; it depends on whether it produces a lower-cost
execution plan. This transformation is cost-based, so it's not the
right mindset to assume that it will always be applied when possible.
In your case, with the filter "t2.c = 5", the row estimate for t2 is
just 1 after the filter has been applied. The planner decides that
adding a partial aggregation on top of such a small result set doesn't
offer much benefit, which seems reasonable to me.
-> Hash (cost=18.50..18.50 rows=1 width=12)
(actual time=0.864..0.865 rows=1.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on eager_agg_t2 t2 (cost=0.00..18.50 rows=1 width=12)
(actual time=0.060..0.851
rows=1.00 loops=1)
Filter: (c = '5'::double precision)
Rows Removed by Filter: 999
With the filter "t2.c > 5", the row estimate for t2 is 995 after
filtering. A partial aggregation can reduce that to 10 rows, so the
planner decides that adding a partial aggregation is beneficial -- and
does so. That also seems reasonable to me.
-> Partial HashAggregate (cost=23.48..23.58 rows=10 width=36)
(actual time=2.427..2.438 rows=10.00 loops=1)
Group Key: t2.b
Batches: 1 Memory Usage: 32kB
-> Seq Scan on eager_agg_t2 t2 (cost=0.00..18.50 rows=995 width=12)
(actual time=0.053..0.989
rows=995.00 loops=1)
Filter: (c > '5'::double precision)
Rows Removed by Filter: 5
> Is this behavior correct? If it's correct, would be possible to check
> this limitation on setup_eager_aggregation() and maybe skip all the
> other work?
Hmm, I wouldn't consider this a limitation; it's just the result of
the planner's cost-based tournament for path selection.
Thanks
Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-01 01:32 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
1 sibling, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-09-01 01:32 UTC (permalink / raw)
To: Matheus Alcantara <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Sat, Aug 9, 2025 at 10:32 AM Richard Guo <[email protected]> wrote:
> OK. Here it is.
This patch needs a rebase; here it is. No changes were made.
- Richard
Attachments:
[application/octet-stream] v20-0001-Implement-Eager-Aggregation.patch (171.8K, 2-v20-0001-Implement-Eager-Aggregation.patch)
download | inline diff:
From 63378cda1912f8bca3455e374638ba02ce1ad651 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v20] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo, Antonin Houska
Reviewed-by: Robert Haas, Jian He, Tender Wang, Paul George, Tom Lane
Reviewed-by: Tomas Vondra, Andy Fan, Ashutosh Bapat
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 31 +
src/backend/optimizer/README | 89 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 453 ++++++
src/backend/optimizer/path/joinrels.c | 193 +++
src/backend/optimizer/plan/initsplan.c | 322 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 59 +
src/backend/optimizer/util/relnode.c | 628 ++++++++
src/backend/utils/misc/guc_tables.c | 21 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/include/nodes/pathnodes.h | 130 ++
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 6 +
src/include/optimizer/planmain.h | 1 +
.../regress/expected/collate.icu.utf8.out | 32 +-
src/test/regress/expected/eager_aggregate.out | 1334 +++++++++++++++++
src/test/regress/expected/join.out | 12 +-
.../regress/expected/partition_aggregate.out | 2 +
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 194 +++
src/test/regress/sql/partition_aggregate.sql | 2 +
src/tools/pgindent/typedefs.list | 3 +
26 files changed, 3653 insertions(+), 74 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 78b8367d289..b6c892bdb51 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3701,30 +3701,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0a4b3e55ba5..aab91625daf 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5475,6 +5475,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
@@ -6095,6 +6110,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-min-eager-agg-group-size" xreflabel="min_eager_agg_group_size">
+ <term><varname>min_eager_agg_group_size</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>min_eager_agg_group_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the minimum average group size required to consider applying
+ eager aggregation. This helps avoid the overhead of eager
+ aggregation when it does not offer significant row count reduction.
+ The default is <literal>8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-jit-above-cost" xreflabel="jit_above_cost">
<term><varname>jit_above_cost</varname> (<type>floating point</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 843368096fd..5af3ced5750 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1500,3 +1500,92 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, we partition the tables
+in the FROM clause into two groups: those that contain at least one
+aggregation column, and those that do not contain any aggregation
+columns. Each group can be treated as a single relation formed by the
+Cartesian product of the tables within that group. Therefore, without
+loss of generality, we can assume that the FROM clause contains
+exactly two relations, R1 and R2, where R1 represents the relation
+containing all aggregation columns, and R2 represents the relation
+without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+One restriction is that we cannot push partial aggregation down to a
+relation that is in the nullable side of an outer join, because the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..7b349a4570e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,7 +79,9 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
+double min_eager_agg_group_size;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +94,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +119,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +188,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +334,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +603,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1358,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3418,328 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3899,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3923,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4813,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 535248aa525..04cbbcea2a4 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -36,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -762,6 +766,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -873,6 +881,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..9cc8c558ccf 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -31,6 +32,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/analyze.h"
#include "rewrite/rewriteManip.h"
+#include "utils/fmgroids.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/typcache.h"
@@ -81,6 +83,9 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool is_partial_agg_memory_risky(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +633,323 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate poses a risk of
+ * excessive memory usage during partial aggregation.
+ */
+ if (is_partial_agg_memory_risky(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * is_partial_agg_memory_risky
+ * Checks if any aggregate poses a risk of excessive memory usage during
+ * partial aggregation.
+ *
+ * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
+ * is marked as pass-by-value, it usually points to a large internal data
+ * structure (like those used by string_agg or array_agg). These transition
+ * states can grow large and their size is hard to estimate. Applying eager
+ * aggregation in such cases risks high memory usage since partial aggregation
+ * results might be stored in join hash tables or materialized nodes.
+ *
+ * We explicitly exclude aggregates with F_NUMERIC_AVG_ACCUM transition
+ * function from this check, based on the assumption that avg(numeric) and
+ * sum(numeric) are safe in this context.
+ */
+static bool
+is_partial_agg_memory_risky(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM)
+ continue;
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, tce->btree_opf);
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 41bd8353430..462c5335589 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -232,7 +232,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -4010,9 +4009,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4094,23 +4091,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7055,16 +7045,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7177,7 +7193,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7195,7 +7211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7203,7 +7219,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7245,19 +7261,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7297,6 +7311,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7307,6 +7322,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7328,11 +7352,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7557,6 +7583,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8120,13 +8191,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..11c0eb0d180 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,65 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ nappinfos, appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 0e523d2eb5b..faa44e46594 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,6 +89,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -278,6 +292,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -408,6 +424,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -759,6 +872,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -945,6 +1060,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2523,3 +2640,514 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than min_eager_agg_group_size.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= min_eager_agg_group_size;
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+ result->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * average group size is no less than min_eager_agg_group_size.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= min_eager_agg_group_size;
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index f137129209f..d3bfcaf0784 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -965,6 +965,16 @@ struct config_bool ConfigureNamesBool[] =
NULL, NULL, NULL
},
+ {
+ {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD,
+ gettext_noop("Enables eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &enable_eager_aggregate,
+ true,
+ NULL, NULL, NULL
+ },
{
{"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD,
gettext_noop("Enables the planner's use of parallel append plans."),
@@ -4050,6 +4060,17 @@ struct config_real ConfigureNamesReal[] =
NULL, NULL, NULL
},
+ {
+ {"min_eager_agg_group_size", PGC_USERSET, QUERY_TUNING_COST,
+ gettext_noop("Sets the minimum average group size required to consider applying eager aggregation."),
+ NULL,
+ GUC_EXPLAIN
+ },
+ &min_eager_agg_group_size,
+ 8.0, 0.0, DBL_MAX,
+ NULL, NULL, NULL
+ },
+
{
{"cursor_tuple_fraction", PGC_USERSET, QUERY_TUNING_OTHER,
gettext_noop("Sets the planner's estimate of the fraction of "
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a9d8293474a..e3cdfe11992 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
@@ -441,6 +442,7 @@
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB
+#min_eager_agg_group_size = 8.0
#jit_above_cost = 100000 # perform JIT compilation if available
# and query more expensive than this;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4a903d1ec18..ad211207343 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -397,6 +397,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1046,6 +1055,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1130,6 +1147,75 @@ typedef struct RelOptInfo
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * min_eager_agg_group_size, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3283,6 +3369,50 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the btree opfamily used for equality comparison. This information is
+ * necessary to reproduce correct grouping semantics at different levels of the
+ * join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 763cd25bb3c..5b9c1daf14b 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -312,6 +312,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -351,4 +355,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cbade77b717..8d03d662a04 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,7 +21,9 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
+extern PGDLLIMPORT double min_eager_agg_group_size;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
extern PGDLLIMPORT bool enable_group_by_reordering;
@@ -57,6 +59,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 69805d4b9ec..ef79d6f1ded 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2437,11 +2437,11 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SET enable_partitionwise_join TO false;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2449,10 +2449,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2464,11 +2466,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2476,10 +2478,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..f02ff0b30a3
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1334 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 04079268b98..d0bb66f43da 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2837,20 +2837,22 @@ select x.thousand, x.twothousand, count(*)
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
group by x.thousand, x.twothousand
order by x.thousand desc, x.twothousand;
- QUERY PLAN
-----------------------------------------------------------------------------------
- GroupAggregate
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
Group Key: x.thousand, x.twothousand
-> Incremental Sort
Sort Key: x.thousand DESC, x.twothousand
Presorted Key: x.thousand
-> Merge Join
Merge Cond: (y.thousand = x.thousand)
- -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
+ -> Partial GroupAggregate
+ Group Key: y.thousand
+ -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
-> Sort
Sort Key: x.thousand DESC
-> Seq Scan on tenk1 x
-(11 rows)
+(13 rows)
reset enable_hashagg;
reset enable_nestloop;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index 5f2c0cf5786..1f56f55155b 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -13,6 +13,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..5da8749a6cb
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,194 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee244..124cc260461 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -14,6 +14,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..9a4567db01a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2473,6 +2475,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.39.5 (Apple Git-154)
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-05 07:35 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-09-05 07:35 UTC (permalink / raw)
To: Matheus Alcantara <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Mon, Sep 1, 2025 at 10:32 AM Richard Guo <[email protected]> wrote:
> This patch needs a rebase; here it is. No changes were made.
Here is a rebase after the GUC tables change.
- Richard
Attachments:
[application/octet-stream] v21-0001-Implement-Eager-Aggregation.patch (172.1K, 2-v21-0001-Implement-Eager-Aggregation.patch)
download | inline diff:
From 3f839b71eb76f9e662f0768ad2aff600d500748f Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v21] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo, Antonin Houska
Reviewed-by: Robert Haas, Jian He, Tender Wang, Paul George, Tom Lane
Reviewed-by: Tomas Vondra, Andy Fan, Ashutosh Bapat
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 31 +
src/backend/optimizer/README | 89 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 453 ++++++
src/backend/optimizer/path/joinrels.c | 193 +++
src/backend/optimizer/plan/initsplan.c | 322 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 59 +
src/backend/optimizer/util/relnode.c | 628 ++++++++
src/backend/utils/misc/guc_parameters.dat | 16 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/include/nodes/pathnodes.h | 130 ++
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 6 +
src/include/optimizer/planmain.h | 1 +
.../regress/expected/collate.icu.utf8.out | 32 +-
src/test/regress/expected/eager_aggregate.out | 1334 +++++++++++++++++
src/test/regress/expected/join.out | 12 +-
.../regress/expected/partition_aggregate.out | 2 +
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 194 +++
src/test/regress/sql/partition_aggregate.sql | 2 +
src/tools/pgindent/typedefs.list | 3 +
26 files changed, 3648 insertions(+), 74 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 78b8367d289..b6c892bdb51 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3701,30 +3701,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0a4b3e55ba5..aab91625daf 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5475,6 +5475,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
@@ -6095,6 +6110,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-min-eager-agg-group-size" xreflabel="min_eager_agg_group_size">
+ <term><varname>min_eager_agg_group_size</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>min_eager_agg_group_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the minimum average group size required to consider applying
+ eager aggregation. This helps avoid the overhead of eager
+ aggregation when it does not offer significant row count reduction.
+ The default is <literal>8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-jit-above-cost" xreflabel="jit_above_cost">
<term><varname>jit_above_cost</varname> (<type>floating point</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 843368096fd..5af3ced5750 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1500,3 +1500,92 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, we partition the tables
+in the FROM clause into two groups: those that contain at least one
+aggregation column, and those that do not contain any aggregation
+columns. Each group can be treated as a single relation formed by the
+Cartesian product of the tables within that group. Therefore, without
+loss of generality, we can assume that the FROM clause contains
+exactly two relations, R1 and R2, where R1 represents the relation
+containing all aggregation columns, and R2 represents the relation
+without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+One restriction is that we cannot push partial aggregation down to a
+relation that is in the nullable side of an outer join, because the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..7b349a4570e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,7 +79,9 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
+double min_eager_agg_group_size;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +94,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +119,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +188,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +334,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +603,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1358,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3418,328 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3899,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3923,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4813,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 535248aa525..04cbbcea2a4 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -36,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -762,6 +766,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -873,6 +881,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..9cc8c558ccf 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -31,6 +32,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/analyze.h"
#include "rewrite/rewriteManip.h"
+#include "utils/fmgroids.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/typcache.h"
@@ -81,6 +83,9 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool is_partial_agg_memory_risky(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +633,323 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate poses a risk of
+ * excessive memory usage during partial aggregation.
+ */
+ if (is_partial_agg_memory_risky(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * is_partial_agg_memory_risky
+ * Checks if any aggregate poses a risk of excessive memory usage during
+ * partial aggregation.
+ *
+ * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
+ * is marked as pass-by-value, it usually points to a large internal data
+ * structure (like those used by string_agg or array_agg). These transition
+ * states can grow large and their size is hard to estimate. Applying eager
+ * aggregation in such cases risks high memory usage since partial aggregation
+ * results might be stored in join hash tables or materialized nodes.
+ *
+ * We explicitly exclude aggregates with F_NUMERIC_AVG_ACCUM transition
+ * function from this check, based on the assumption that avg(numeric) and
+ * sum(numeric) are safe in this context.
+ */
+static bool
+is_partial_agg_memory_risky(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM)
+ continue;
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, tce->btree_opf);
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 41bd8353430..462c5335589 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -232,7 +232,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -4010,9 +4009,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4094,23 +4091,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7055,16 +7045,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7177,7 +7193,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7195,7 +7211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7203,7 +7219,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7245,19 +7261,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7297,6 +7311,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7307,6 +7322,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7328,11 +7352,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7557,6 +7583,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8120,13 +8191,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..11c0eb0d180 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,65 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ nappinfos, appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 0e523d2eb5b..faa44e46594 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,6 +89,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -278,6 +292,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -408,6 +424,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -759,6 +872,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -945,6 +1060,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2523,3 +2640,514 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than min_eager_agg_group_size.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= min_eager_agg_group_size;
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+ result->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * average group size is no less than min_eager_agg_group_size.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= min_eager_agg_group_size;
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index a157cec3c4d..466aabb8cf0 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -145,6 +145,13 @@
boot_val => 'false',
},
+{ name => 'enable_eager_aggregate', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+ short_desc => 'Enables eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'enable_eager_aggregate',
+ boot_val => 'true',
+},
+
{ name => 'enable_parallel_append', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
short_desc => 'Enables the planner\'s use of parallel append plans.',
flags => 'GUC_EXPLAIN',
@@ -2421,6 +2428,15 @@
max => 'DBL_MAX',
},
+{ name => 'min_eager_agg_group_size', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_COST',
+ short_desc => 'Sets the minimum average group size required to consider applying eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'min_eager_agg_group_size',
+ boot_val => '8.0',
+ min => '0.0',
+ max => 'DBL_MAX',
+},
+
{ name => 'cursor_tuple_fraction', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_OTHER',
short_desc => 'Sets the planner\'s estimate of the fraction of a cursor\'s rows that will be retrieved.',
flags => 'GUC_EXPLAIN',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index a9d8293474a..e3cdfe11992 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
@@ -441,6 +442,7 @@
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB
+#min_eager_agg_group_size = 8.0
#jit_above_cost = 100000 # perform JIT compilation if available
# and query more expensive than this;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4a903d1ec18..ad211207343 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -397,6 +397,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1046,6 +1055,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1130,6 +1147,75 @@ typedef struct RelOptInfo
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * min_eager_agg_group_size, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3283,6 +3369,50 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the btree opfamily used for equality comparison. This information is
+ * necessary to reproduce correct grouping semantics at different levels of the
+ * join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 763cd25bb3c..5b9c1daf14b 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -312,6 +312,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -351,4 +355,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cbade77b717..8d03d662a04 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,7 +21,9 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
+extern PGDLLIMPORT double min_eager_agg_group_size;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
extern PGDLLIMPORT bool enable_group_by_reordering;
@@ -57,6 +59,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 69805d4b9ec..ef79d6f1ded 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2437,11 +2437,11 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SET enable_partitionwise_join TO false;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2449,10 +2449,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2464,11 +2466,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2476,10 +2478,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..f02ff0b30a3
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1334 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 04079268b98..d0bb66f43da 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2837,20 +2837,22 @@ select x.thousand, x.twothousand, count(*)
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
group by x.thousand, x.twothousand
order by x.thousand desc, x.twothousand;
- QUERY PLAN
-----------------------------------------------------------------------------------
- GroupAggregate
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
Group Key: x.thousand, x.twothousand
-> Incremental Sort
Sort Key: x.thousand DESC, x.twothousand
Presorted Key: x.thousand
-> Merge Join
Merge Cond: (y.thousand = x.thousand)
- -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
+ -> Partial GroupAggregate
+ Group Key: y.thousand
+ -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
-> Sort
Sort Key: x.thousand DESC
-> Seq Scan on tenk1 x
-(11 rows)
+(13 rows)
reset enable_hashagg;
reset enable_nestloop;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index 5f2c0cf5786..1f56f55155b 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -13,6 +13,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..5da8749a6cb
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,194 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee244..124cc260461 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -14,6 +14,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..9a4567db01a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2473,6 +2475,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.39.5 (Apple Git-154)
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-05 13:09 Robert Haas <[email protected]>
parent: Richard Guo <[email protected]>
1 sibling, 1 reply; 55+ messages in thread
From: Robert Haas @ 2025-09-05 13:09 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
Sorry for the slow response.
On Fri, Jun 13, 2025 at 3:42 AM Richard Guo <[email protected]> wrote:
> The transformation of eager aggregation is:
>
> GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
> =
> GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1)
> JOIN R2 ON J)
>
> This equivalence holds under the following conditions:
>
> 1) AGG is decomposable, meaning that it can be computed in two stages:
> a partial aggregation followed by a final aggregation;
> 2) The set G1 used in the pre-aggregation of R1 includes:
> * all columns from R1 that are part of the grouping keys G, and
> * all columns from R1 that appear in the join condition J.
> 3) The grouping operator for any column in G1 must be compatible with
> the operator used for that column in the join condition J.
This proof seems to ignore join-order constraints. I'm not sure to
what degree that influences the ultimate outcome here, but given A
LEFT JOIN (B INNER JOIN C), we cannot simply decide that A and C
comprise R1 and B comprises R2, because it is not actually possible to
do the A-C join first and treat the result as a relation to be joined
to B. That said, I do very much like the explicit enumeration of
criteria that must be met for the optimization to be valid. That makes
it a lot easier to evaluate whether the theory of the patch is
correct.
> To address these concerns, I'm thinking that maybe we can adopt a
> strategy where partial aggregation is only pushed to the lowest
> possible level in the join tree that is deemed useful. In other
> words, if we can build a grouped path like "AGG(B) JOIN A" -- and
> AGG(B) yields a significant reduction in row count -- we skip
> exploring alternatives like "AGG(A JOIN B)".
I really like this idea. I believe we need some heuristic here and
this seems like a reasonable one. I think there could be a better one,
potentially. For instance, it would be reasonable (in my opinion) to
do some kind of evaluation of AGG(A JOIN B) vs. AGG(B) JOIN A that
does not involve performing full path generation for both cases; e.g.
one could try to decide considering only row counts, for instance.
However, I'm not saying that would work better than your proposal
here, or that it should be a requirement for this to be committed;
it's just an idea. IMHO, the requirement to have something committable
is that there is SOME heuristic limiting the search space and at the
same time the patch can still be demonstrated to give SOME benefit. I
think what you propose here meets those criteria. I also like the fact
that it's simple and easy to understand. If it does go wrong, it will
not be too difficult for someone to understand why it has gone wrong,
which is very desirable.
> I think this heuristic serves as a good starting point, and we can
> look into extending it with more advanced strategies as the feature
> evolves.
So IOW, +1 to what you say here.
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-05 13:12 Robert Haas <[email protected]>
parent: Richard Guo <[email protected]>
2 siblings, 1 reply; 55+ messages in thread
From: Robert Haas @ 2025-09-05 13:12 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Wed, Aug 6, 2025 at 3:52 AM Richard Guo <[email protected]> wrote:
> To avoid potential memory blowout risks from large partial aggregation
> values, v18 avoids applying eager aggregation if any aggregate uses an
> INTERNAL transition type, as this typically indicates a large internal
> data structure (as in string_agg or array_agg). However, this also
> excludes aggregates like avg(numeric) and sum(numeric), which are
> actually safe to use with eager aggregation.
>
> What we really want to exclude are aggregate functions that can
> produce large transition values by accumulating or concatenating input
> rows. So I'm wondering if we could instead check the transfn_oid
> directly and explicitly exclude only F_ARRAY_AGG_TRANSFN and
> F_STRING_AGG_TRANSFN. We don't need to worry about json_agg,
> jsonb_agg, or xmlagg, since they don't support partial aggregation
> anyway.
This strategy seems fairly unfriendly towards out-of-core code. Can
you come up with something that allows the author of a SQL-callable
function to include or exclude the function by a choice that is under
their control, rather than hard-coding something in PostgreSQL itself?
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-05 14:37 Robert Haas <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Robert Haas @ 2025-09-05 14:37 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: Matheus Alcantara <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Fri, Sep 5, 2025 at 3:35 AM Richard Guo <[email protected]> wrote:
> Here is a rebase after the GUC tables change.
I spent a bit of time scrolling through this today. Here are a few
observations/review comments.
It looks as though this will create a bunch of RelOptInfo objects that
don't end up getting used for anything once the apply_at test in
generate_grouped_paths() fails. It seems to me that it would be better
to altogether avoid generating the RelOptInfo in that case.
I think it would be worth considering generating the partially grouped
relations in a second pass. Right now, as you progress from the bottom
of the join tree towards the top, you created grouped rels as you go.
But you could equally well finish planning everything up to the
scan/join target first and then go back and add grouped_rels to
relations where it seems worthwhile. I don't know if this would really
make a big difference as you have things today, but I think it might
provided a better structure for the future, because you would then
have a lot more information with which to judge where to do
aggregation. For instance, you could looked at the row counts of any
number of those ungrouped-rels before deciding where to put the
partial aggregation. That seems like it could be pretty valuable.
I haven't done a detailed comparison of generate_grouped_paths() to
other parts of the code, but I have an uncomfortable feeling that it
might be rather similar to some existing code that probably already
exists in multiple, slightly-different versions. Is there any
refactoring we could do here?
Do you need a test of this feature in combination with GEQO? You have
code for it but I don't immediately see a test. I didn't check
carefully, though.
Overall I like the direction this is heading. I don't feel
well-qualified to evaluate whether all of the things that you're doing
are completely safe. The logic in is_var_in_aggref_only() and
is_var_needed_by_join() scares me a bit because I worry that the
checks are somehow non-exhaustive, but I don't know of a specific
hazard. That said, I think that modulo such issues, this has a good
chance of significantly improving performance for certain query
shapes.
One thing to check might be whether you can construct any cases where
the strategy is applied too boldly. Given the safeguards you've put in
place that seems a little a little hard to construct. The most obvious
thing that occurs to me is an aggregate where combining is more
expensive than aggregating, so that the partial aggregation gives the
appearance of saving more work than it really does, but I can't
immediately think of a problem case. Another case could be where the
row counts are off, leading to us mistakenly believing that we're
going to reduce the number of rows that need to be processed when we
really don't. Of course, such a case would arguably be a fault of the
bad row-count estimate rather than this patch, but if the patch has
that problem frequently, it might need to be addressed. Still, I have
a feeling that the testing you've already been doing might have
surfaced such cases if they were common. Have you looked into how many
queries in the regression tests, or in TPC-H/DS, expend significant
planning effort on this strategy before discarding it? That might be a
good way to get a sense of whether the patch is too aggressive, not
aggressive enough, a mix of the two, or just right.
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-05 14:50 Robert Haas <[email protected]>
parent: Richard Guo <[email protected]>
2 siblings, 1 reply; 55+ messages in thread
From: Robert Haas @ 2025-09-05 14:50 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Wed, Aug 6, 2025 at 3:52 AM Richard Guo <[email protected]> wrote:
> Looking at TPC-DS queries 4 and 11, a threshold of 10 is the minimum
> needed to consider eager aggregation for them. The resulting plans
> show nice performance improvements without any measurable increase in
> planning time. So, I'm inclined to lower the threshold to 10 for now.
> (Wondering whether we should make this threshold a GUC, so users can
> adjust it based on their needs.)
Like Matheus, I think a GUC is reasonable. A significant danger here
appears to be the possibility of a performance cliff, where queries
are optimized very different when the ratio is 9.99 vs. 10.01, say. It
would be nice if there were some way to mitigate that danger, but at
least a GUC avoids chaining the performance of the whole system to a
hard-coded value.
It might be worth considering whether there are heuristics other than
the group size that could help here. Possibly that's just making
things more complicated to no benefit. It seems to me, for example,
that reducing 100 rows to 10 is quite different from reducing a
million rows to 100,000. On the whole, the latter seems more likely to
work out well, but it's tricky, because the effort expended per group
can be arbitrarily high. I think we do want to let the cost model make
most of the decisions, and just use this threshold to prune ideas that
are obviously bad at an early stage. That said, it's worth thinking
about how this interacts with the just-considered-one-eager-agg
strategy. Does this threshold apply before or after that rule?
For instance, consider AGG(FACT_TABLE JOIN DIMENSION_TABLE), like a
count of orders grouped by customer name. Aggregating on the dimension
table (in this case, the list of customers) is probably useless, but
aggregating on the join column of the fact table has a good chance of
being useful. If we consider only one of those strategies, we want it
to be the right one. This threshold could be the thing that helps us
to get it right.
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-09 09:07 Richard Guo <[email protected]>
parent: Robert Haas <[email protected]>
0 siblings, 0 replies; 55+ messages in thread
From: Richard Guo @ 2025-09-09 09:07 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Fri, Sep 5, 2025 at 10:10 PM Robert Haas <[email protected]> wrote:
> On Fri, Jun 13, 2025 at 3:42 AM Richard Guo <[email protected]> wrote:
> > The transformation of eager aggregation is:
> >
> > GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
> > =
> > GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1)
> > JOIN R2 ON J)
> >
> > This equivalence holds under the following conditions:
> >
> > 1) AGG is decomposable, meaning that it can be computed in two stages:
> > a partial aggregation followed by a final aggregation;
> > 2) The set G1 used in the pre-aggregation of R1 includes:
> > * all columns from R1 that are part of the grouping keys G, and
> > * all columns from R1 that appear in the join condition J.
> > 3) The grouping operator for any column in G1 must be compatible with
> > the operator used for that column in the join condition J.
> This proof seems to ignore join-order constraints. I'm not sure to
> what degree that influences the ultimate outcome here, but given A
> LEFT JOIN (B INNER JOIN C), we cannot simply decide that A and C
> comprise R1 and B comprises R2, because it is not actually possible to
> do the A-C join first and treat the result as a relation to be joined
> to B. That said, I do very much like the explicit enumeration of
> criteria that must be met for the optimization to be valid. That makes
> it a lot easier to evaluate whether the theory of the patch is
> correct.
Thanks for pointing this out. I should have clarified that the proof
is intended for the inner join case. My plan was to first establish
the correctness for inner joins, and then extend the proof to cover
outer joins, but I failed to make that clear.
In the case where there are any outer joins, the situation becomes
more complex due to join order constraints and the semantics of
null-extension in outer joins. If the relations that contain at least
one aggregation column cannot be treated as a single relation because
of the join order constraints, partial aggregation paths will not be
generated, and thus the transformation is not applicable.
Otherwise, to preserve correctness, we need to add an additional
condition: R1 must not be on the nullable side of any outer join.
This ensures that partial aggregation over R1 does not suppress any
null-extended rows that would be introduced by outer joins.
I'll update the proof in README to cover the outer join case.
> > To address these concerns, I'm thinking that maybe we can adopt a
> > strategy where partial aggregation is only pushed to the lowest
> > possible level in the join tree that is deemed useful. In other
> > words, if we can build a grouped path like "AGG(B) JOIN A" -- and
> > AGG(B) yields a significant reduction in row count -- we skip
> > exploring alternatives like "AGG(A JOIN B)".
> I really like this idea. I believe we need some heuristic here and
> this seems like a reasonable one. I think there could be a better one,
> potentially. For instance, it would be reasonable (in my opinion) to
> do some kind of evaluation of AGG(A JOIN B) vs. AGG(B) JOIN A that
> does not involve performing full path generation for both cases; e.g.
> one could try to decide considering only row counts, for instance.
> However, I'm not saying that would work better than your proposal
> here, or that it should be a requirement for this to be committed;
> it's just an idea. IMHO, the requirement to have something committable
> is that there is SOME heuristic limiting the search space and at the
> same time the patch can still be demonstrated to give SOME benefit. I
> think what you propose here meets those criteria. I also like the fact
> that it's simple and easy to understand. If it does go wrong, it will
> not be too difficult for someone to understand why it has gone wrong,
> which is very desirable.
> > I think this heuristic serves as a good starting point, and we can
> > look into extending it with more advanced strategies as the feature
> > evolves.
> So IOW, +1 to what you say here.
Thanks for liking this idea. Another way this heuristic makes life
easier is that it ensures all grouped paths for the same grouped
relation produce the same set of rows. This means we don't need all
the hacks for comparing costs between grouped paths, nor do we have to
resolve disputes about how many RelOptInfos to create for a single
grouped relation. I'd prefer to keep this property for now and
explore more complex heuristics in the future.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-09 09:20 Richard Guo <[email protected]>
parent: Robert Haas <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-09-09 09:20 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Fri, Sep 5, 2025 at 10:12 PM Robert Haas <[email protected]> wrote:
> On Wed, Aug 6, 2025 at 3:52 AM Richard Guo <[email protected]> wrote:
> > What we really want to exclude are aggregate functions that can
> > produce large transition values by accumulating or concatenating input
> > rows. So I'm wondering if we could instead check the transfn_oid
> > directly and explicitly exclude only F_ARRAY_AGG_TRANSFN and
> > F_STRING_AGG_TRANSFN. We don't need to worry about json_agg,
> > jsonb_agg, or xmlagg, since they don't support partial aggregation
> > anyway.
> This strategy seems fairly unfriendly towards out-of-core code. Can
> you come up with something that allows the author of a SQL-callable
> function to include or exclude the function by a choice that is under
> their control, rather than hard-coding something in PostgreSQL itself?
Yeah, ideally we should tell whether an aggregate's transition state
may grow unbounded just by looking at system catalogs. Unfortunately,
after trying for a while, it seems to me that the current catalog
doesn't provide enough information.
I once considered adding a flag (e.g., aggtransbounded) to catalog
pg_aggregate to indicate whether the transition state size is bounded.
This flag could be specified by users when creating aggregate
functions, and then leveraged by features such as eager aggregation.
However, adding new information to system catalogs involves a lot of
discussions and changes, including updates to DDL commands, dump and
restore processes, and upgrade procedures. Therefore, to keep the
focus of this patch on the eager aggregation feature itself, I prefer
to treat this enhancement as future work.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-09 10:30 Richard Guo <[email protected]>
parent: Robert Haas <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-09-09 10:30 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Matheus Alcantara <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Fri, Sep 5, 2025 at 11:37 PM Robert Haas <[email protected]> wrote:
> I spent a bit of time scrolling through this today. Here are a few
> observations/review comments.
Thanks for all the comments.
> It looks as though this will create a bunch of RelOptInfo objects that
> don't end up getting used for anything once the apply_at test in
> generate_grouped_paths() fails. It seems to me that it would be better
> to altogether avoid generating the RelOptInfo in that case.
Hmm, that's not the case. make_grouped_join_rel() guarantees that for
a given relation, if its grouped paths are not considered useful, and
no grouped paths can be built by joining grouped input relations, then
its grouped relation will not be created. IOW, we only create a
grouped RelOptInfo if we've determined that we can generate useful
grouped paths for it.
In the case you mentioned, where the apply_at test in
generate_grouped_paths() fails, it must mean that grouped paths can be
built by joining its outer and inner relations. Also, note that calls
to generate_grouped_paths() are always followed by calls to
set_cheapest(). If we failed to generate any grouped paths for a
grouped relation, the set_cheapest() call should already have reported
an error.
> I think it would be worth considering generating the partially grouped
> relations in a second pass. Right now, as you progress from the bottom
> of the join tree towards the top, you created grouped rels as you go.
> But you could equally well finish planning everything up to the
> scan/join target first and then go back and add grouped_rels to
> relations where it seems worthwhile.
Hmm, I don't think so. I think the presence of eager aggregation
could change the best join order. For example, without eager
aggregation, the optimizer might find that (A JOIN B) JOIN C the best
join order. But with eager aggregation on B, the optimizer could
prefer A JOIN (AGG(B) JOIN C). I'm not sure how we could find the
best join order with eager aggregation applied without building the
join tree from the bottom up.
> I haven't done a detailed comparison of generate_grouped_paths() to
> other parts of the code, but I have an uncomfortable feeling that it
> might be rather similar to some existing code that probably already
> exists in multiple, slightly-different versions. Is there any
> refactoring we could do here?
Yeah, we currently have several functions that do similar, but not
exactly the same, things. Maybe some refactoring is possible -- maybe
not -- I haven't looked into it closely yet. However, I'd prefer to
address that in a separate patch if possible, since this issue also
exists on master, and I want to avoid introducing such changes in this
already large patch.
> Do you need a test of this feature in combination with GEQO? You have
> code for it but I don't immediately see a test. I didn't check
> carefully, though.
Good point. I do have manually tested GEQO by setting geqo_threshold
to 2 and running the regression tests to check for any planning
errors, crashes, or incorrect results. However, I'm not sure where
test cases for GEQO should be added. I searched the regression tests
and found only one explicit GEQO test, added back in 2009 (commit
a43b190e3). It's not quite clear to me what the current policy is for
adding GEQO test cases.
Anyway, I will add some test cases in eager_aggregate.sql with
geqo_threshold set to 2.
> Overall I like the direction this is heading. I don't feel
> well-qualified to evaluate whether all of the things that you're doing
> are completely safe. The logic in is_var_in_aggref_only() and
> is_var_needed_by_join() scares me a bit because I worry that the
> checks are somehow non-exhaustive, but I don't know of a specific
> hazard. That said, I think that modulo such issues, this has a good
> chance of significantly improving performance for certain query
> shapes.
>
> One thing to check might be whether you can construct any cases where
> the strategy is applied too boldly. Given the safeguards you've put in
> place that seems a little a little hard to construct. The most obvious
> thing that occurs to me is an aggregate where combining is more
> expensive than aggregating, so that the partial aggregation gives the
> appearance of saving more work than it really does, but I can't
> immediately think of a problem case. Another case could be where the
> row counts are off, leading to us mistakenly believing that we're
> going to reduce the number of rows that need to be processed when we
> really don't. Of course, such a case would arguably be a fault of the
> bad row-count estimate rather than this patch, but if the patch has
> that problem frequently, it might need to be addressed. Still, I have
> a feeling that the testing you've already been doing might have
> surfaced such cases if they were common. Have you looked into how many
> queries in the regression tests, or in TPC-H/DS, expend significant
> planning effort on this strategy before discarding it? That might be a
> good way to get a sense of whether the patch is too aggressive, not
> aggressive enough, a mix of the two, or just right.
I previously looked into the TPC-DS queries where eager aggregation
was applied and didn't observe any regressions in planning time or
execution time. I can run TPC-DS again to check the planning time for
the remaining queries.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-09 11:18 Richard Guo <[email protected]>
parent: Robert Haas <[email protected]>
0 siblings, 0 replies; 55+ messages in thread
From: Richard Guo @ 2025-09-09 11:18 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Fri, Sep 5, 2025 at 11:50 PM Robert Haas <[email protected]> wrote:
> Like Matheus, I think a GUC is reasonable. A significant danger here
> appears to be the possibility of a performance cliff, where queries
> are optimized very different when the ratio is 9.99 vs. 10.01, say. It
> would be nice if there were some way to mitigate that danger, but at
> least a GUC avoids chaining the performance of the whole system to a
> hard-coded value.
Yeah, I think the performance cliff issue does exist. It might be
mitigated by carefully selecting the threshold value to ensure that
small differences in the average group size near the boundary don't
cause big performance swings with and without eager aggregation, but
this doesn't seem like an easy task.
How is this issue avoided in other thresholds? For example, with
min_parallel_table_scan_size, is there a performance cliff when the
table size is 7.99MB vs. 8.01MB, where a parallel scan is considered
in the latter case but not the former?
> It might be worth considering whether there are heuristics other than
> the group size that could help here. Possibly that's just making
> things more complicated to no benefit. It seems to me, for example,
> that reducing 100 rows to 10 is quite different from reducing a
> million rows to 100,000. On the whole, the latter seems more likely to
> work out well, but it's tricky, because the effort expended per group
> can be arbitrarily high. I think we do want to let the cost model make
> most of the decisions, and just use this threshold to prune ideas that
> are obviously bad at an early stage. That said, it's worth thinking
> about how this interacts with the just-considered-one-eager-agg
> strategy. Does this threshold apply before or after that rule?
If I understand correctly, this means that we need to explore each
join level to find out the most optimal position for applying partial
aggregation. For example, suppose Agg(B) reduces 100 rows to 10, and
Agg(A JOIN B) reduces a million rows to 100,000, it might be better to
apply partial aggregation at the (A JOIN B) level rather than just
over B. However, that's not always the case: the Agg(B) option can
reduce the number of input rows to the join earlier, potentially
outperforming the Agg(A JOIN B) approach. Therefore, we need to
consider both options and compare their costs.
This is actually what the patch used to do before I introduced the
always-push-to-lowest heuristic.
> For instance, consider AGG(FACT_TABLE JOIN DIMENSION_TABLE), like a
> count of orders grouped by customer name. Aggregating on the dimension
> table (in this case, the list of customers) is probably useless, but
> aggregating on the join column of the fact table has a good chance of
> being useful. If we consider only one of those strategies, we want it
> to be the right one. This threshold could be the thing that helps us
> to get it right.
Now I see what you meant. However, in the current implementation, we
only push partial aggregation down to relations that contain all the
aggregation columns. So, in the case you mentioned, if the
aggregation columns come from the dimension table, unfortunately, we
don't have the option to partially aggregate the fact table.
The paper does discuss several other transformations, such as "Eager
Count", "Double Eager", and "Eager Split", that can perform partial
aggregation on relations that don't contain aggregation columns, or
even on both sides of the join. However, those are beyond the scope
of this patch.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-09 14:20 Robert Haas <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Robert Haas @ 2025-09-09 14:20 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Tue, Sep 9, 2025 at 5:20 AM Richard Guo <[email protected]> wrote:
> Yeah, ideally we should tell whether an aggregate's transition state
> may grow unbounded just by looking at system catalogs. Unfortunately,
> after trying for a while, it seems to me that the current catalog
> doesn't provide enough information.
>
> I once considered adding a flag (e.g., aggtransbounded) to catalog
> pg_aggregate to indicate whether the transition state size is bounded.
> This flag could be specified by users when creating aggregate
> functions, and then leveraged by features such as eager aggregation.
>
> However, adding new information to system catalogs involves a lot of
> discussions and changes, including updates to DDL commands, dump and
> restore processes, and upgrade procedures. Therefore, to keep the
> focus of this patch on the eager aggregation feature itself, I prefer
> to treat this enhancement as future work.
I don't really like that. I think there's a lot of danger of that
future work never getting done, and thus leaving us stuck more-or-less
permanently with a system that's not really extensible. Data type and
function extensibility is one of the strongest areas of PostgreSQL,
and we should try hard to avoid situations where we regress it. I'm
not sure whether the aggtransbounded flag is exactly the right thing
here, but I don't think adding a new catalog column is an unreasonable
amount of work for a feature of this type.
Having said that, I wonder whether there's some way that we could use
the aggtransspace property for this. For instance, for stanullfrac, we
use values >0 to mean absolute quantities and values <0 to mean
proportions. The current definition of aggtranspace assigns no meaning
to values <0, and the current coding seems to assume that sizes are
fixed regardless of how many inputs are supplied. Maybe we could
define aggtransspace<0 to mean that the number of bytes used per input
value is the additive inverse of the value, or something like that.
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-09 14:30 Robert Haas <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 0 replies; 55+ messages in thread
From: Robert Haas @ 2025-09-09 14:30 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: Matheus Alcantara <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Tue, Sep 9, 2025 at 6:30 AM Richard Guo <[email protected]> wrote:
> > I think it would be worth considering generating the partially grouped
> > relations in a second pass. Right now, as you progress from the bottom
> > of the join tree towards the top, you created grouped rels as you go.
> > But you could equally well finish planning everything up to the
> > scan/join target first and then go back and add grouped_rels to
> > relations where it seems worthwhile.
>
> Hmm, I don't think so. I think the presence of eager aggregation
> could change the best join order. For example, without eager
> aggregation, the optimizer might find that (A JOIN B) JOIN C the best
> join order. But with eager aggregation on B, the optimizer could
> prefer A JOIN (AGG(B) JOIN C). I'm not sure how we could find the
> best join order with eager aggregation applied without building the
> join tree from the bottom up.
Oh, that is a problem, yes. :-(
> > I haven't done a detailed comparison of generate_grouped_paths() to
> > other parts of the code, but I have an uncomfortable feeling that it
> > might be rather similar to some existing code that probably already
> > exists in multiple, slightly-different versions. Is there any
> > refactoring we could do here?
>
> Yeah, we currently have several functions that do similar, but not
> exactly the same, things. Maybe some refactoring is possible -- maybe
> not -- I haven't looked into it closely yet. However, I'd prefer to
> address that in a separate patch if possible, since this issue also
> exists on master, and I want to avoid introducing such changes in this
> already large patch.
Well, it's not just a matter of "this already exists" -- it gets
harder and harder to unify things the more near-copies you add.
> Good point. I do have manually tested GEQO by setting geqo_threshold
> to 2 and running the regression tests to check for any planning
> errors, crashes, or incorrect results. However, I'm not sure where
> test cases for GEQO should be added. I searched the regression tests
> and found only one explicit GEQO test, added back in 2009 (commit
> a43b190e3). It's not quite clear to me what the current policy is for
> adding GEQO test cases.
>
> Anyway, I will add some test cases in eager_aggregate.sql with
> geqo_threshold set to 2.
Sounds good. I think GEQO is mostly-unmaintained these days, but if
we're updating the code, I think it is good to add tests. Being that
the code is so old, it probably lacks adequate test coverage.
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-12 09:34 Richard Guo <[email protected]>
parent: Robert Haas <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-09-12 09:34 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Tue, Sep 9, 2025 at 11:20 PM Robert Haas <[email protected]> wrote:
> Having said that, I wonder whether there's some way that we could use
> the aggtransspace property for this. For instance, for stanullfrac, we
> use values >0 to mean absolute quantities and values <0 to mean
> proportions. The current definition of aggtranspace assigns no meaning
> to values <0, and the current coding seems to assume that sizes are
> fixed regardless of how many inputs are supplied. Maybe we could
> define aggtransspace<0 to mean that the number of bytes used per input
> value is the additive inverse of the value, or something like that.
I really like this idea. Currently, aggtransspace represents an
estimate of the transition state size provided by the aggregate
definition. If it's set to zero, a default estimate based on the
state data type is used. Negative values currently have no defined
meaning. I think it makes perfect sense to reuse this field so that
a negative value indicates that the transition state data can grow
unboundedly in size.
Attached 0002 implements this idea. It requires fewer code changes
than I expected. This is mainly because that our current code uses
aggtransspace in such a way that if it's a positive value, that value
is used as it's provided by the aggregate definition; otherwise, some
heuristics are applied to estimate the size. For the aggregates that
accumulate input rows (e.g., array_agg, string_agg), I don't currently
have a better heuristic for estimating their size, so I've chosen to
keep the current logic. This won't regress anything in estimating
transition state data size.
- Richard
Attachments:
[application/octet-stream] v22-0001-Implement-Eager-Aggregation.patch (184.9K, 2-v22-0001-Implement-Eager-Aggregation.patch)
download | inline diff:
From 8a780d897ec5205a48867f3dc291edf80707aca3 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v22 1/2] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo <[email protected]>
Author: Antonin Houska <[email protected]> (in an older version)
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jian He <[email protected]>
Reviewed-by: Tender Wang <[email protected]>
Reviewed-by: Matheus Alcantara <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]> (in an older version)
Reviewed-by: Andy Fan <[email protected]> (in an older version)
Reviewed-by: Ashutosh Bapat <[email protected]> (in an older version)
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 31 +
src/backend/optimizer/README | 110 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 453 +++++
src/backend/optimizer/path/joinrels.c | 193 ++
src/backend/optimizer/plan/initsplan.c | 323 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 59 +
src/backend/optimizer/util/relnode.c | 628 +++++++
src/backend/utils/misc/guc_parameters.dat | 16 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/include/nodes/pathnodes.h | 130 ++
src/include/optimizer/pathnode.h | 5 +
src/include/optimizer/paths.h | 6 +
src/include/optimizer/planmain.h | 1 +
.../regress/expected/collate.icu.utf8.out | 32 +-
src/test/regress/expected/eager_aggregate.out | 1584 +++++++++++++++++
src/test/regress/expected/join.out | 12 +-
.../regress/expected/partition_aggregate.out | 2 +
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 225 +++
src/test/regress/sql/partition_aggregate.sql | 2 +
src/tools/pgindent/typedefs.list | 3 +
26 files changed, 3951 insertions(+), 74 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 18d727d7790..f1b2d684e35 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3701,30 +3701,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 2a3685f474a..bac3c3270a0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5475,6 +5475,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
@@ -6095,6 +6110,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-min-eager-agg-group-size" xreflabel="min_eager_agg_group_size">
+ <term><varname>min_eager_agg_group_size</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>min_eager_agg_group_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the minimum average group size required to consider applying
+ eager aggregation. This helps avoid the overhead of eager
+ aggregation when it does not offer significant row count reduction.
+ The default is <literal>8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-jit-above-cost" xreflabel="jit_above_cost">
<term><varname>jit_above_cost</varname> (<type>floating point</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 843368096fd..6c35baceedb 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1500,3 +1500,113 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, let's first consider the
+case where only inner joins are involved. In this case, we partition
+the tables in the FROM clause into two groups: those that contain at
+least one aggregation column, and those that do not contain any
+aggregation columns. Each group can be treated as a single relation
+formed by the Cartesian product of the tables within that group.
+Therefore, without loss of generality, we can assume that the FROM
+clause contains exactly two relations, R1 and R2, where R1 represents
+the relation containing all aggregation columns, and R2 represents the
+relation without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+In the case where there are any outer joins, the situation becomes
+more complex due to join order constraints and the semantics of
+null-extension in outer joins. If the relations that contain at least
+one aggregation column cannot be treated as a single relation because
+of the join order constraints, partial aggregation paths will not be
+generated, and thus the transformation is not applicable. Otherwise,
+let R1 be the relation containing all aggregation columns, and R2, R3,
+... be the remaining relations. From the inner join case, under the
+aforementioned conditions, we have the equivalence:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 JOIN R3 ...)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 JOIN R3 ...)
+
+To preserve correctness when outer joins are involved, we require an
+additional condition:
+
+4) R1 must not be on the nullable side of any outer join.
+
+This condition ensures that partial aggregation over R1 does not
+suppress any null-extended rows that would be introduced by outer
+joins. If R1 is on the nullable side of an outer join, the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..7b349a4570e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,7 +79,9 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
+double min_eager_agg_group_size;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +94,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +119,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +188,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +334,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +603,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1358,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3418,328 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ agg_info->group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3899,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3923,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4813,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 535248aa525..04cbbcea2a4 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -36,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -762,6 +766,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -873,6 +881,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..1b778f692d4 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -31,6 +32,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/analyze.h"
#include "rewrite/rewriteManip.h"
+#include "utils/fmgroids.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/typcache.h"
@@ -81,6 +83,9 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool is_partial_agg_memory_risky(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +633,324 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate poses a risk of
+ * excessive memory usage during partial aggregation.
+ */
+ if (is_partial_agg_memory_risky(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * is_partial_agg_memory_risky
+ * Checks if any aggregate poses a risk of excessive memory usage during
+ * partial aggregation.
+ *
+ * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
+ * is marked as pass-by-value, it usually points to a large internal data
+ * structure (like those used by string_agg or array_agg). These transition
+ * states can grow large and their size is hard to estimate. Applying eager
+ * aggregation in such cases risks high memory usage since partial aggregation
+ * results might be stored in join hash tables or materialized nodes.
+ *
+ * We explicitly exclude aggregates with AVG_ACCUM transition function from
+ * this check, based on the assumption that avg() and sum() are safe in this
+ * context.
+ */
+static bool
+is_partial_agg_memory_risky(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM ||
+ transinfo->transfn_oid == F_INT8_AVG_ACCUM)
+ continue;
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *btree_opfamilies = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ btree_opfamilies = lappend_oid(btree_opfamilies, tce->btree_opf);
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ Oid btree_opfamily = lfirst_oid(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->btree_opfamily = btree_opfamily;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 41bd8353430..462c5335589 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -232,7 +232,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -4010,9 +4009,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4094,23 +4091,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7055,16 +7045,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7177,7 +7193,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7195,7 +7211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7203,7 +7219,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7245,19 +7261,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7297,6 +7311,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7307,6 +7322,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7328,11 +7352,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7557,6 +7583,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8120,13 +8191,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..11c0eb0d180 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,65 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newinfo, oldinfo, sizeof(RelAggInfo));
+
+ newinfo->relids = adjust_child_relids(oldinfo->relids,
+ nappinfos, appinfos);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses,
+ context);
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 0e523d2eb5b..faa44e46594 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,6 +89,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -278,6 +292,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -408,6 +424,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -759,6 +872,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -945,6 +1060,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2523,3 +2640,514 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than min_eager_agg_group_size.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= min_eager_agg_group_size;
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+
+ /* Calculate pathkeys that represent this grouping requirements */
+ result->group_pathkeys =
+ make_pathkeys_for_sortclauses(root, result->group_clauses,
+ make_tlist_from_pathtarget(target));
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ result->relids = bms_copy(rel->relids);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+ result->apply_at = NULL; /* caller will change this later */
+
+ /*
+ * The grouped paths for the given relation are considered useful iff the
+ * average group size is no less than min_eager_agg_group_size.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= min_eager_agg_group_size;
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+
+ Assert(IsA(ge_info->expr, Var));
+
+ if (equal(ge_info->expr, expr) ||
+ exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr,
+ ge_info->btree_opfamily))
+ {
+ Assert(ge_info->sortgroupref > 0);
+
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 0da01627cfe..f35dd1b23bf 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -145,6 +145,13 @@
boot_val => 'false',
},
+{ name => 'enable_eager_aggregate', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+ short_desc => 'Enables eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'enable_eager_aggregate',
+ boot_val => 'true',
+},
+
{ name => 'enable_parallel_append', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
short_desc => 'Enables the planner\'s use of parallel append plans.',
flags => 'GUC_EXPLAIN',
@@ -2427,6 +2434,15 @@
max => 'DBL_MAX',
},
+{ name => 'min_eager_agg_group_size', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_COST',
+ short_desc => 'Sets the minimum average group size required to consider applying eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'min_eager_agg_group_size',
+ boot_val => '8.0',
+ min => '0.0',
+ max => 'DBL_MAX',
+},
+
{ name => 'cursor_tuple_fraction', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_OTHER',
short_desc => 'Sets the planner\'s estimate of the fraction of a cursor\'s rows that will be retrieved.',
flags => 'GUC_EXPLAIN',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 26c08693564..7325bcd439d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
@@ -441,6 +442,7 @@
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB
+#min_eager_agg_group_size = 8.0
#jit_above_cost = 100000 # perform JIT compilation if available
# and query more expensive than this;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4a903d1ec18..ad211207343 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -397,6 +397,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1046,6 +1055,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1130,6 +1147,75 @@ typedef struct RelOptInfo
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "relids" is the set of relation identifiers (RT indexes).
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "group_clauses", "group_exprs" and "group_pathkeys" are lists of
+ * SortGroupClauses, the corresponding grouping expressions and PathKeys
+ * respectively.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * min_eager_agg_group_size, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* set of base + OJ relids (rangetable indexes) */
+ Relids relids;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+ /* a list of PathKeys */
+ List *group_pathkeys;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3283,6 +3369,50 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the btree opfamily used for equality comparison. This information is
+ * necessary to reproduce correct grouping semantics at different levels of the
+ * join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* btree opfamily defining the ordering */
+ Oid btree_opfamily;
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 763cd25bb3c..5b9c1daf14b 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -312,6 +312,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -351,4 +355,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cbade77b717..8d03d662a04 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,7 +21,9 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
+extern PGDLLIMPORT double min_eager_agg_group_size;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
extern PGDLLIMPORT bool enable_group_by_reordering;
@@ -57,6 +59,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 69805d4b9ec..ef79d6f1ded 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2437,11 +2437,11 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SET enable_partitionwise_join TO false;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2449,10 +2449,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2464,11 +2466,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2476,10 +2478,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..0dab585e9ce
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1584 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+--
+-- Test eager aggregation with GEQO
+--
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 04079268b98..d0bb66f43da 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2837,20 +2837,22 @@ select x.thousand, x.twothousand, count(*)
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
group by x.thousand, x.twothousand
order by x.thousand desc, x.twothousand;
- QUERY PLAN
-----------------------------------------------------------------------------------
- GroupAggregate
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
Group Key: x.thousand, x.twothousand
-> Incremental Sort
Sort Key: x.thousand DESC, x.twothousand
Presorted Key: x.thousand
-> Merge Join
Merge Cond: (y.thousand = x.thousand)
- -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
+ -> Partial GroupAggregate
+ Group Key: y.thousand
+ -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
-> Sort
Sort Key: x.thousand DESC
-> Seq Scan on tenk1 x
-(11 rows)
+(13 rows)
reset enable_hashagg;
reset enable_nestloop;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index 5f2c0cf5786..1f56f55155b 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -13,6 +13,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..8b1049ae3f3
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,225 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+--
+-- Test eager aggregation with GEQO
+--
+
+SET geqo = on;
+SET geqo_threshold = 2;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee244..124cc260461 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -14,6 +14,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index a13e8162890..9a4567db01a 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2473,6 +2475,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.39.5 (Apple Git-154)
[application/octet-stream] v22-0002-Allow-negative-aggtransspace-to-indicate-unbound.patch (8.4K, 3-v22-0002-Allow-negative-aggtransspace-to-indicate-unbound.patch)
download | inline diff:
From ec282bb7fb963325a30a3e94375289aa5457004b Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 12 Sep 2025 13:11:47 +0900
Subject: [PATCH v22 2/2] Allow negative aggtransspace to indicate unbounded
state size
This patch reuses the existing aggtransspace in pg_aggregate to
signal that an aggregate's transition state can grow unboundedly. If
aggtransspace is set to a negative value, it now indicates that the
transition state may consume unpredictable or large amounts of memory,
such as in aggregates like array_agg or string_agg that accumulate
input rows.
This information can be used by the planner to avoid applying
memory-sensitive optimizations (e.g., eager aggregation) when there is
a risk of excessive memory usage during partial aggregation.
Bump catalog version.
---
doc/src/sgml/catalogs.sgml | 5 ++++-
doc/src/sgml/ref/create_aggregate.sgml | 11 ++++++++---
src/backend/optimizer/plan/initsplan.c | 23 +++++++----------------
src/include/catalog/catversion.h | 2 +-
src/include/catalog/pg_aggregate.dat | 10 ++++++----
src/test/regress/expected/opr_sanity.out | 2 +-
src/test/regress/sql/opr_sanity.sql | 2 +-
7 files changed, 28 insertions(+), 27 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e9095bedf21..3acc2222a87 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -596,7 +596,10 @@
</para>
<para>
Approximate average size (in bytes) of the transition state
- data, or zero to use a default estimate
+ data. A positive value provides an estimate; zero means to
+ use a default estimate. A negative value indicates the state
+ data can grow unboundedly in size, such as when the aggregate
+ accumulates input rows (e.g., array_agg, string_agg).
</para></entry>
</row>
diff --git a/doc/src/sgml/ref/create_aggregate.sgml b/doc/src/sgml/ref/create_aggregate.sgml
index 222e0aa5c9d..0472ac2e874 100644
--- a/doc/src/sgml/ref/create_aggregate.sgml
+++ b/doc/src/sgml/ref/create_aggregate.sgml
@@ -384,9 +384,13 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<para>
The approximate average size (in bytes) of the aggregate's state value.
If this parameter is omitted or is zero, a default estimate is used
- based on the <replaceable>state_data_type</replaceable>.
+ based on the <replaceable>state_data_type</replaceable>. If set to a
+ negative value, it indicates the state data can grow unboundedly in
+ size, such as when the aggregate accumulates input rows (e.g.,
+ array_agg, string_agg).
The planner uses this value to estimate the memory required for a
- grouped aggregate query.
+ grouped aggregate query and to avoid optimizations that may cause
+ excessive memory usage.
</para>
</listitem>
</varlistentry>
@@ -568,7 +572,8 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<para>
The approximate average size (in bytes) of the aggregate's state
value, when using moving-aggregate mode. This works the same as
- <replaceable>state_data_size</replaceable>.
+ <replaceable>state_data_size</replaceable>, except that negative
+ values are not used to indicate unbounded state size.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 1b778f692d4..cb29c72c96c 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -716,19 +716,14 @@ setup_eager_aggregation(PlannerInfo *root)
/*
* is_partial_agg_memory_risky
- * Checks if any aggregate poses a risk of excessive memory usage during
+ * Check if any aggregate poses a risk of excessive memory usage during
* partial aggregation.
*
- * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
- * is marked as pass-by-value, it usually points to a large internal data
- * structure (like those used by string_agg or array_agg). These transition
- * states can grow large and their size is hard to estimate. Applying eager
- * aggregation in such cases risks high memory usage since partial aggregation
- * results might be stored in join hash tables or materialized nodes.
- *
- * We explicitly exclude aggregates with AVG_ACCUM transition function from
- * this check, based on the assumption that avg() and sum() are safe in this
- * context.
+ * We check if any aggregate has a negative aggtransspace value, which
+ * indicates that its transition state data can grow unboundedly in size.
+ * Applying eager aggregation in such cases risks high memory usage since
+ * partial aggregation results might be stored in join hash tables or
+ * materialized nodes.
*/
static bool
is_partial_agg_memory_risky(PlannerInfo *root)
@@ -739,11 +734,7 @@ is_partial_agg_memory_risky(PlannerInfo *root)
{
AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
- if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM ||
- transinfo->transfn_oid == F_INT8_AVG_ACCUM)
- continue;
-
- if (transinfo->aggtranstype == INTERNALOID)
+ if (transinfo->aggtransspace < 0)
return true;
}
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index ef0d0f92165..62b0af3e0c3 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 202509091
+#define CATALOG_VERSION_NO 202509121
#endif
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index d6aa1f6ec47..870769e8f14 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -558,26 +558,28 @@
aggfinalfn => 'array_agg_finalfn', aggcombinefn => 'array_agg_combine',
aggserialfn => 'array_agg_serialize',
aggdeserialfn => 'array_agg_deserialize', aggfinalextra => 't',
- aggtranstype => 'internal' },
+ aggtranstype => 'internal', aggtransspace => '-1' },
{ aggfnoid => 'array_agg(anyarray)', aggtransfn => 'array_agg_array_transfn',
aggfinalfn => 'array_agg_array_finalfn',
aggcombinefn => 'array_agg_array_combine',
aggserialfn => 'array_agg_array_serialize',
aggdeserialfn => 'array_agg_array_deserialize', aggfinalextra => 't',
- aggtranstype => 'internal' },
+ aggtranstype => 'internal', aggtransspace => '-1' },
# text
{ aggfnoid => 'string_agg(text,text)', aggtransfn => 'string_agg_transfn',
aggfinalfn => 'string_agg_finalfn', aggcombinefn => 'string_agg_combine',
aggserialfn => 'string_agg_serialize',
- aggdeserialfn => 'string_agg_deserialize', aggtranstype => 'internal' },
+ aggdeserialfn => 'string_agg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '-1' },
# bytea
{ aggfnoid => 'string_agg(bytea,bytea)',
aggtransfn => 'bytea_string_agg_transfn',
aggfinalfn => 'bytea_string_agg_finalfn',
aggcombinefn => 'string_agg_combine', aggserialfn => 'string_agg_serialize',
- aggdeserialfn => 'string_agg_deserialize', aggtranstype => 'internal' },
+ aggdeserialfn => 'string_agg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '-1' },
# range
{ aggfnoid => 'range_intersect_agg(anyrange)',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 20bf9ea9cdf..a357e1d0c0e 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -1470,7 +1470,7 @@ WHERE aggfnoid = 0 OR aggtransfn = 0 OR
(aggkind = 'n' AND aggnumdirectargs > 0) OR
aggfinalmodify NOT IN ('r', 's', 'w') OR
aggmfinalmodify NOT IN ('r', 's', 'w') OR
- aggtranstype = 0 OR aggtransspace < 0 OR aggmtransspace < 0;
+ aggtranstype = 0 OR aggmtransspace < 0;
ctid | aggfnoid
------+----------
(0 rows)
diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql
index 2fb3a852878..cd674d7dbca 100644
--- a/src/test/regress/sql/opr_sanity.sql
+++ b/src/test/regress/sql/opr_sanity.sql
@@ -847,7 +847,7 @@ WHERE aggfnoid = 0 OR aggtransfn = 0 OR
(aggkind = 'n' AND aggnumdirectargs > 0) OR
aggfinalmodify NOT IN ('r', 's', 'w') OR
aggmfinalmodify NOT IN ('r', 's', 'w') OR
- aggtranstype = 0 OR aggtransspace < 0 OR aggmtransspace < 0;
+ aggtranstype = 0 OR aggmtransspace < 0;
-- Make sure the matching pg_proc entry is sensible, too.
--
2.39.5 (Apple Git-154)
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-12 18:47 Robert Haas <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Robert Haas @ 2025-09-12 18:47 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Fri, Sep 12, 2025 at 5:34 AM Richard Guo <[email protected]> wrote:
> I really like this idea. Currently, aggtransspace represents an
> estimate of the transition state size provided by the aggregate
> definition. If it's set to zero, a default estimate based on the
> state data type is used. Negative values currently have no defined
> meaning. I think it makes perfect sense to reuse this field so that
> a negative value indicates that the transition state data can grow
> unboundedly in size.
>
> Attached 0002 implements this idea. It requires fewer code changes
> than I expected. This is mainly because that our current code uses
> aggtransspace in such a way that if it's a positive value, that value
> is used as it's provided by the aggregate definition; otherwise, some
> heuristics are applied to estimate the size. For the aggregates that
> accumulate input rows (e.g., array_agg, string_agg), I don't currently
> have a better heuristic for estimating their size, so I've chosen to
> keep the current logic. This won't regress anything in estimating
> transition state data size.
This might be OK, but it's not what I was suggesting: I was suggesting
trying to do a calculation like space_used = -aggtransspace *
rowcount, not just using a <0 value as a sentinel.
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-13 08:27 Richard Guo <[email protected]>
parent: Robert Haas <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-09-13 08:27 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Sat, Sep 13, 2025 at 3:48 AM Robert Haas <[email protected]> wrote:
> On Fri, Sep 12, 2025 at 5:34 AM Richard Guo <[email protected]> wrote:
> > I really like this idea. Currently, aggtransspace represents an
> > estimate of the transition state size provided by the aggregate
> > definition. If it's set to zero, a default estimate based on the
> > state data type is used. Negative values currently have no defined
> > meaning. I think it makes perfect sense to reuse this field so that
> > a negative value indicates that the transition state data can grow
> > unboundedly in size.
> >
> > Attached 0002 implements this idea. It requires fewer code changes
> > than I expected. This is mainly because that our current code uses
> > aggtransspace in such a way that if it's a positive value, that value
> > is used as it's provided by the aggregate definition; otherwise, some
> > heuristics are applied to estimate the size. For the aggregates that
> > accumulate input rows (e.g., array_agg, string_agg), I don't currently
> > have a better heuristic for estimating their size, so I've chosen to
> > keep the current logic. This won't regress anything in estimating
> > transition state data size.
> This might be OK, but it's not what I was suggesting: I was suggesting
> trying to do a calculation like space_used = -aggtransspace *
> rowcount, not just using a <0 value as a sentinel.
I've considered your suggestion, but I'm not sure I'll adopt it in the
end. Here's why:
1) At the point where we check whether any aggregates might pose a
risk of excessive memory usage during partial aggregation, row count
information is not yet available. You could argue that we could
reorganize the logic to perform this check after we've had the row
count, but that seems quite tricky. If I understand correctly, the
"rowcount" in this context actually means the number of rows within
one partial group. That would require us to first decide on the
grouping expressions for the partial aggregation, then compute the
group row counts, then estimate space usage, and only then decide
whether memory usage is excessive and fall back. This would come
quite late in planning and adds nontrivial overhead, compared to the
current approach which checks at the very beginning.
2) Even if we were able to estimate space usage based on the number of
rows per partial group and determined that memory usage seems
acceptable, we still couldn't guarantee that the transition state data
won't grow excessively after further joins. Joins can multiply
partial aggregates, potentially causing a blowup in memory usage even
if the initial estimate seemed safe.
3) I don't think "-aggtransspace * rowcount" reflects the true memory
footprint for aggregates that accumulate input rows. For example,
what if we have an aggregate like string_agg(somecolumn, 'a very long
delimiter')?
4) AFAICS, the main downside of the current approach compared to yours
is that it avoids pushing down aggregates like string_agg() that
accumulate input rows, whereas your suggestion might allow pushing
them down in some cases where we *think* it wouldn't blow up memory.
You might argue that the current implementation is over-conservative.
But I prefer to start safe.
That said, I appreciate you proposing the idea of reusing
aggtransspace, although I ended up using it in a different way than
you suggested.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-25 04:23 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-09-25 04:23 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
I've run TPC-DS again to compare planning times with and without eager
aggregation. Out of 99 queries, only one query (query 64) shows a
noticeable increase in planning time. This query performs inner joins
across 38 tables. This is a very large search space. (I'm talking
about the standard join search method, not the GEQO.)
If my math doesn't fail me, the maximum number of different join
orders when joining n tables is: Catalan(n − 1) x n!. For n = 38,
this number is astronomically large. In practice, query 64 joins 19
tables twice (due to a CTE), which still results in about 3.4E28
different join orders.
Of course, in practice, with the help of join_collapse_limit and other
heuristics, the effective search space is reduced a lot, but even
then, it remains very large. Given this, I'm not too surprised that
query 64 shows an increase in planning time when eager aggregation is
applied -- exploring the best join order in such a space is inherently
expensive.
That said, I've identified a few performance hotspots that can be
optimized to help reduce planning time:
1) the exprs_known_equal() call in get_expression_sortgroupref(),
which is used to check if a given expression is known equal to a
grouping expression due to ECs. We can optimize this by storing the
EC of each grouping expression, and then get_expression_sortgroupref()
would only need to search the relevant EC, rather than scanning all of
them.
2) the estimate_num_groups() call in create_rel_agg_info(). We can
optimize this by avoiding unnecessary calls to estimate_num_groups()
where possible.
Attached is an updated version of the patch with these optimizations
applied. With this patch, the planning times for query 64, with and
without eager aggregation, are:
-- with eager aggregation
Planning Time: 9432.042 ms
-- without eager aggregation
Planning Time: 7196.999 ms
I think the increase in planning time is acceptable given the large
search space involved, though I may be biased.
- Richard
Attachments:
[application/octet-stream] v23-0001-Implement-Eager-Aggregation.patch (187.4K, 2-v23-0001-Implement-Eager-Aggregation.patch)
download | inline diff:
From 63d36fe266e5c8ab19079698a3ea5e9abb3218bd Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v23 1/2] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo <[email protected]>
Author: Antonin Houska <[email protected]> (in an older version)
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jian He <[email protected]>
Reviewed-by: Tender Wang <[email protected]>
Reviewed-by: Matheus Alcantara <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]> (in an older version)
Reviewed-by: Andy Fan <[email protected]> (in an older version)
Reviewed-by: Ashutosh Bapat <[email protected]> (in an older version)
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 31 +
src/backend/optimizer/README | 110 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +
src/backend/optimizer/path/allpaths.c | 469 +++++
src/backend/optimizer/path/joinrels.c | 193 ++
src/backend/optimizer/plan/initsplan.c | 379 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 51 +
src/backend/optimizer/util/relnode.c | 650 +++++++
src/backend/utils/misc/guc_parameters.dat | 16 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/include/nodes/pathnodes.h | 121 ++
src/include/optimizer/pathnode.h | 6 +
src/include/optimizer/paths.h | 6 +
src/include/optimizer/planmain.h | 1 +
.../regress/expected/collate.icu.utf8.out | 32 +-
src/test/regress/expected/eager_aggregate.out | 1584 +++++++++++++++++
src/test/regress/expected/join.out | 12 +-
.../regress/expected/partition_aggregate.out | 2 +
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 225 +++
src/test/regress/sql/partition_aggregate.sql | 2 +
src/tools/pgindent/typedefs.list | 3 +
26 files changed, 4029 insertions(+), 74 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6dc04e916dc..f5a57b9cbd5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3701,30 +3701,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e9b420f3ddb..39e658b7808 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5475,6 +5475,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
@@ -6095,6 +6110,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-min-eager-agg-group-size" xreflabel="min_eager_agg_group_size">
+ <term><varname>min_eager_agg_group_size</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>min_eager_agg_group_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the minimum average group size required to consider applying
+ eager aggregation. This helps avoid the overhead of eager
+ aggregation when it does not offer significant row count reduction.
+ The default is <literal>8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-jit-above-cost" xreflabel="jit_above_cost">
<term><varname>jit_above_cost</varname> (<type>floating point</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 843368096fd..6c35baceedb 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1500,3 +1500,113 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, let's first consider the
+case where only inner joins are involved. In this case, we partition
+the tables in the FROM clause into two groups: those that contain at
+least one aggregation column, and those that do not contain any
+aggregation columns. Each group can be treated as a single relation
+formed by the Cartesian product of the tables within that group.
+Therefore, without loss of generality, we can assume that the FROM
+clause contains exactly two relations, R1 and R2, where R1 represents
+the relation containing all aggregation columns, and R2 represents the
+relation without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+In the case where there are any outer joins, the situation becomes
+more complex due to join order constraints and the semantics of
+null-extension in outer joins. If the relations that contain at least
+one aggregation column cannot be treated as a single relation because
+of the join order constraints, partial aggregation paths will not be
+generated, and thus the transformation is not applicable. Otherwise,
+let R1 be the relation containing all aggregation columns, and R2, R3,
+... be the remaining relations. From the inner join case, under the
+aforementioned conditions, we have the equivalence:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 JOIN R3 ...)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 JOIN R3 ...)
+
+To preserve correctness when outer joins are involved, we require an
+additional condition:
+
+4) R1 must not be on the nullable side of any outer join.
+
+This condition ensures that partial aggregation over R1 does not
+suppress any null-extended rows that would be introduced by outer
+joins. If R1 is on the nullable side of an outer join, the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..4a65f955ca6 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -279,6 +279,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(joinrel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index 6cc6966b060..ee298970427 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,7 +79,9 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
+double min_eager_agg_group_size;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +94,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_base_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +119,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +188,11 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for base rels where possible.
+ */
+ setup_base_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +334,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_base_grouped_rels
+ * For each base relation, build a grouped base relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_base_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +603,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1358,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3335,6 +3418,344 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped base or join
+ * relation.
+ *
+ * The information needed are provided by the RelAggInfo structure.
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel, RelAggInfo *agg_info)
+{
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+ List *group_pathkeys = NIL;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping, and generate the pathkeys that represent the grouping
+ * requirements in that case.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+ if (can_sort)
+ {
+ RelOptInfo *top_grouped_rel;
+ List *top_group_tlist;
+
+ top_grouped_rel = IS_OTHER_REL(rel) ?
+ rel->top_parent->grouped_rel : grouped_rel;
+ top_group_tlist =
+ make_tlist_from_pathtarget(top_grouped_rel->agg_info->target);
+
+ group_pathkeys =
+ make_pathkeys_for_sortclauses(root, agg_info->group_clauses,
+ top_group_tlist);
+ }
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3494,6 +3915,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
@@ -3514,6 +3939,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(rel->relids, root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4383,6 +4829,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (!bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = child_rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel,
+ grouped_rel->agg_info);
+ set_cheapest(grouped_rel);
+ }
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 535248aa525..240eda53696 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -36,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -762,6 +766,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -873,6 +881,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel, rel1_empty == rel2_empty);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the lowest join level where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the lowest join level
+ * where partial aggregation is applied. Note that these values
+ * may be updated later if it is determined that grouped paths can
+ * be constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the lowest join level where partial aggregation is applied among
+ * the given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..1af43bb60d2 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -31,6 +32,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/analyze.h"
#include "rewrite/rewriteManip.h"
+#include "utils/fmgroids.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/typcache.h"
@@ -81,6 +83,12 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool is_partial_agg_memory_risky(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
+static EquivalenceClass *get_eclass_for_sortgroupclause(PlannerInfo *root,
+ SortGroupClause *sgc,
+ Expr *expr);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +636,377 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate poses a risk of
+ * excessive memory usage during partial aggregation.
+ */
+ if (is_partial_agg_memory_risky(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * is_partial_agg_memory_risky
+ * Checks if any aggregate poses a risk of excessive memory usage during
+ * partial aggregation.
+ *
+ * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
+ * is marked as pass-by-value, it usually points to a large internal data
+ * structure (like those used by string_agg or array_agg). These transition
+ * states can grow large and their size is hard to estimate. Applying eager
+ * aggregation in such cases risks high memory usage since partial aggregation
+ * results might be stored in join hash tables or materialized nodes.
+ *
+ * We explicitly exclude aggregates with AVG_ACCUM transition function from
+ * this check, based on the assumption that avg() and sum() are safe in this
+ * context.
+ */
+static bool
+is_partial_agg_memory_risky(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM ||
+ transinfo->transfn_oid == F_INT8_AVG_ACCUM)
+ continue;
+
+ if (transinfo->aggtranstype == INTERNALOID)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *ecs = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ ecs = lappend(ecs, get_eclass_for_sortgroupclause(root, sgc, tle->expr));
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, ecs)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->ec = ec;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
+/*
+ * get_eclass_for_sortgroupclause
+ * Given a group clause and an expression, find an existing equivalence
+ * class that the expression is a member of; return NULL if none.
+ */
+static EquivalenceClass *
+get_eclass_for_sortgroupclause(PlannerInfo *root, SortGroupClause *sgc,
+ Expr *expr)
+{
+ Oid opfamily,
+ opcintype,
+ collation;
+ CompareType cmptype;
+ Oid equality_op;
+ List *opfamilies;
+
+ /* Punt if the group clause is not sortable */
+ if (!OidIsValid(sgc->sortop))
+ return NULL;
+
+ /* Find the operator in pg_amop --- failure shouldn't happen */
+ if (!get_ordering_op_properties(sgc->sortop,
+ &opfamily, &opcintype, &cmptype))
+ elog(ERROR, "operator %u is not a valid ordering operator",
+ sgc->sortop);
+
+ /* Because SortGroupClause doesn't carry collation, consult the expr */
+ collation = exprCollation((Node *) expr);
+
+ /*
+ * EquivalenceClasses need to contain opfamily lists based on the family
+ * membership of mergejoinable equality operators, which could belong to
+ * more than one opfamily. So we have to look up the opfamily's equality
+ * operator and get its membership.
+ */
+ equality_op = get_opfamily_member_for_cmptype(opfamily,
+ opcintype,
+ opcintype,
+ COMPARE_EQ);
+ if (!OidIsValid(equality_op)) /* shouldn't happen */
+ elog(ERROR, "missing operator %d(%u,%u) in opfamily %u",
+ COMPARE_EQ, opcintype, opcintype, opfamily);
+ opfamilies = get_mergejoin_opfamilies(equality_op);
+ if (!opfamilies) /* certainly should find some */
+ elog(ERROR, "could not find opfamilies for equality operator %u",
+ equality_op);
+
+ /* Now find a matching EquivalenceClass */
+ return get_eclass_for_sort_expr(root, expr, opfamilies, opcintype,
+ collation, sgc->tleSortGroupRef,
+ NULL, false);
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 41bd8353430..462c5335589 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -232,7 +232,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -4010,9 +4009,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4094,23 +4091,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7055,16 +7045,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7177,7 +7193,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7195,7 +7211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7203,7 +7219,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7245,19 +7261,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7297,6 +7311,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7307,6 +7322,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7328,11 +7352,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7557,6 +7583,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8120,13 +8191,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..69b8b0c2ae0 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,57 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = oldinfo->group_clauses;
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 0e523d2eb5b..e5bab59fbbe 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,6 +89,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -278,6 +292,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -408,6 +424,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * base relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this base
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel, true);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given base relation are not considered useful,
+ * skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Tracks the lowest join level at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -759,6 +872,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -945,6 +1060,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2523,3 +2640,536 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ *
+ * calculate_grouped_rows: if true, calculate the estimated number of grouped
+ * rows for the relation. If false, skip the estimation to avoid unnecessary
+ * planning overhead.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel,
+ bool calculate_grouped_rows)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ if (calculate_grouped_rows)
+ {
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful
+ * iff the average group size is no less than
+ * min_eager_agg_group_size.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= min_eager_agg_group_size;
+ }
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (list_length(group_clauses) == 0)
+ return NULL;
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+ result->apply_at = NULL; /* caller will change this later */
+
+ if (calculate_grouped_rows)
+ {
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than min_eager_agg_group_size.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= min_eager_agg_group_size;
+ }
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ Assert(IsA(expr, Var));
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+ ListCell *lc1;
+
+ Assert(IsA(ge_info->expr, Var));
+ Assert(ge_info->sortgroupref > 0);
+
+ if (equal(expr, ge_info->expr))
+ return ge_info->sortgroupref;
+
+ if (ge_info->ec == NULL ||
+ !bms_is_member(((Var *) expr)->varno, ge_info->ec->ec_relids))
+ continue;
+
+ /*
+ * Scan the EquivalenceClass, looking for a match to the given
+ * expression. We ignore child members here.
+ */
+ foreach(lc1, ge_info->ec->ec_members)
+ {
+ EquivalenceMember *em = (EquivalenceMember *) lfirst(lc1);
+
+ /* Child members should not exist in ec_members */
+ Assert(!em->em_is_child);
+
+ if (equal(expr, em->em_expr))
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 6bc6be13d2a..b176d5130e4 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -145,6 +145,13 @@
boot_val => 'false',
},
+{ name => 'enable_eager_aggregate', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+ short_desc => 'Enables eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'enable_eager_aggregate',
+ boot_val => 'true',
+},
+
{ name => 'enable_parallel_append', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
short_desc => 'Enables the planner\'s use of parallel append plans.',
flags => 'GUC_EXPLAIN',
@@ -2427,6 +2434,15 @@
max => 'DBL_MAX',
},
+{ name => 'min_eager_agg_group_size', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_COST',
+ short_desc => 'Sets the minimum average group size required to consider applying eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'min_eager_agg_group_size',
+ boot_val => '8.0',
+ min => '0.0',
+ max => 'DBL_MAX',
+},
+
{ name => 'cursor_tuple_fraction', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_OTHER',
short_desc => 'Sets the planner\'s estimate of the fraction of a cursor\'s rows that will be retrieved.',
flags => 'GUC_EXPLAIN',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c36fcb9ab61..c5d612ab552 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
@@ -441,6 +442,7 @@
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB
+#min_eager_agg_group_size = 8.0
#jit_above_cost = 100000 # perform JIT compilation if available
# and query more expensive than this;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index b12a2508d8c..2786f8f0c4d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -391,6 +391,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1040,6 +1049,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1124,6 +1141,67 @@ typedef struct RelOptInfo
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create grouped paths for base and join rels.
+ *
+ * "target" is the output tlist for the grouped paths.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "group_clauses" and "group_exprs" are lists of SortGroupClauses and the
+ * corresponding grouping expressions.
+ *
+ * "apply_at" tracks the lowest join level at which partial aggregation is
+ * applied.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * min_eager_agg_group_size, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /*
+ * default result targetlist for Paths scanning this grouped relation;
+ * list of Vars/Exprs, cost, width
+ */
+ struct PathTarget *target;
+
+ /*
+ * the targetlist for Paths that provide input to the grouped paths
+ */
+ struct PathTarget *agg_input;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+
+ /* lowest level partial aggregation is applied at */
+ Relids apply_at;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3268,6 +3346,49 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the EquivalenceClass it belongs to. This information is necessary to
+ * reproduce correct grouping semantics at different levels of the join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* the equivalence class the expression belongs to */
+ EquivalenceClass *ec pg_node_attr(copy_as_scalar, equal_as_scalar);
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 763cd25bb3c..e509b8144ce 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -312,6 +312,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
+ RelOptInfo *rel_plain);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -351,4 +355,6 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel,
+ bool calculate_grouped_rows);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cbade77b717..8d03d662a04 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,7 +21,9 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
+extern PGDLLIMPORT double min_eager_agg_group_size;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
extern PGDLLIMPORT bool enable_group_by_reordering;
@@ -57,6 +59,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root,
+ RelOptInfo *rel_grouped,
+ RelOptInfo *rel_plain,
+ RelAggInfo *agg_info);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 69805d4b9ec..ef79d6f1ded 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2437,11 +2437,11 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SET enable_partitionwise_join TO false;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2449,10 +2449,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2464,11 +2466,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2476,10 +2478,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..0dab585e9ce
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1584 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+--
+-- Test eager aggregation with GEQO
+--
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index cd37f549b5a..bdbf21a874d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2840,20 +2840,22 @@ select x.thousand, x.twothousand, count(*)
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
group by x.thousand, x.twothousand
order by x.thousand desc, x.twothousand;
- QUERY PLAN
-----------------------------------------------------------------------------------
- GroupAggregate
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
Group Key: x.thousand, x.twothousand
-> Incremental Sort
Sort Key: x.thousand DESC, x.twothousand
Presorted Key: x.thousand
-> Merge Join
Merge Cond: (y.thousand = x.thousand)
- -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
+ -> Partial GroupAggregate
+ Group Key: y.thousand
+ -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
-> Sort
Sort Key: x.thousand DESC
-> Seq Scan on tenk1 x
-(11 rows)
+(13 rows)
reset enable_hashagg;
reset enable_nestloop;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index cb12bf53719..fc84929a002 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -13,6 +13,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..8b1049ae3f3
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,225 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+--
+-- Test eager aggregation with GEQO
+--
+
+SET geqo = on;
+SET geqo_threshold = 2;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a;
+
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x;
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y;
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x;
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee244..124cc260461 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -14,6 +14,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 3c80d49b67e..09752d57da4 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2473,6 +2475,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.39.5 (Apple Git-154)
[application/octet-stream] v23-0002-Allow-negative-aggtransspace-to-indicate-unbound.patch (8.0K, 3-v23-0002-Allow-negative-aggtransspace-to-indicate-unbound.patch)
download | inline diff:
From 48b807a93c29c534c0151b950563b28021acd8c1 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 12 Sep 2025 13:11:47 +0900
Subject: [PATCH v23 2/2] Allow negative aggtransspace to indicate unbounded
state size
This patch reuses the existing aggtransspace in pg_aggregate to
signal that an aggregate's transition state can grow unboundedly. If
aggtransspace is set to a negative value, it now indicates that the
transition state may consume unpredictable or large amounts of memory,
such as in aggregates like array_agg or string_agg that accumulate
input rows.
This information can be used by the planner to avoid applying
memory-sensitive optimizations (e.g., eager aggregation) when there is
a risk of excessive memory usage during partial aggregation.
Bump catalog version.
---
doc/src/sgml/catalogs.sgml | 5 ++++-
doc/src/sgml/ref/create_aggregate.sgml | 11 ++++++++---
src/backend/optimizer/plan/initsplan.c | 23 +++++++----------------
src/include/catalog/pg_aggregate.dat | 10 ++++++----
src/test/regress/expected/opr_sanity.out | 2 +-
src/test/regress/sql/opr_sanity.sql | 2 +-
6 files changed, 27 insertions(+), 26 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e9095bedf21..3acc2222a87 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -596,7 +596,10 @@
</para>
<para>
Approximate average size (in bytes) of the transition state
- data, or zero to use a default estimate
+ data. A positive value provides an estimate; zero means to
+ use a default estimate. A negative value indicates the state
+ data can grow unboundedly in size, such as when the aggregate
+ accumulates input rows (e.g., array_agg, string_agg).
</para></entry>
</row>
diff --git a/doc/src/sgml/ref/create_aggregate.sgml b/doc/src/sgml/ref/create_aggregate.sgml
index 222e0aa5c9d..0472ac2e874 100644
--- a/doc/src/sgml/ref/create_aggregate.sgml
+++ b/doc/src/sgml/ref/create_aggregate.sgml
@@ -384,9 +384,13 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<para>
The approximate average size (in bytes) of the aggregate's state value.
If this parameter is omitted or is zero, a default estimate is used
- based on the <replaceable>state_data_type</replaceable>.
+ based on the <replaceable>state_data_type</replaceable>. If set to a
+ negative value, it indicates the state data can grow unboundedly in
+ size, such as when the aggregate accumulates input rows (e.g.,
+ array_agg, string_agg).
The planner uses this value to estimate the memory required for a
- grouped aggregate query.
+ grouped aggregate query and to avoid optimizations that may cause
+ excessive memory usage.
</para>
</listitem>
</varlistentry>
@@ -568,7 +572,8 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<para>
The approximate average size (in bytes) of the aggregate's state
value, when using moving-aggregate mode. This works the same as
- <replaceable>state_data_size</replaceable>.
+ <replaceable>state_data_size</replaceable>, except that negative
+ values are not used to indicate unbounded state size.
</para>
</listitem>
</varlistentry>
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 1af43bb60d2..b8d1c7e88a3 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -719,19 +719,14 @@ setup_eager_aggregation(PlannerInfo *root)
/*
* is_partial_agg_memory_risky
- * Checks if any aggregate poses a risk of excessive memory usage during
+ * Check if any aggregate poses a risk of excessive memory usage during
* partial aggregation.
*
- * We check if any aggregate uses INTERNAL transition type. Although INTERNAL
- * is marked as pass-by-value, it usually points to a large internal data
- * structure (like those used by string_agg or array_agg). These transition
- * states can grow large and their size is hard to estimate. Applying eager
- * aggregation in such cases risks high memory usage since partial aggregation
- * results might be stored in join hash tables or materialized nodes.
- *
- * We explicitly exclude aggregates with AVG_ACCUM transition function from
- * this check, based on the assumption that avg() and sum() are safe in this
- * context.
+ * We check if any aggregate has a negative aggtransspace value, which
+ * indicates that its transition state data can grow unboundedly in size.
+ * Applying eager aggregation in such cases risks high memory usage since
+ * partial aggregation results might be stored in join hash tables or
+ * materialized nodes.
*/
static bool
is_partial_agg_memory_risky(PlannerInfo *root)
@@ -742,11 +737,7 @@ is_partial_agg_memory_risky(PlannerInfo *root)
{
AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
- if (transinfo->transfn_oid == F_NUMERIC_AVG_ACCUM ||
- transinfo->transfn_oid == F_INT8_AVG_ACCUM)
- continue;
-
- if (transinfo->aggtranstype == INTERNALOID)
+ if (transinfo->aggtransspace < 0)
return true;
}
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index d6aa1f6ec47..870769e8f14 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -558,26 +558,28 @@
aggfinalfn => 'array_agg_finalfn', aggcombinefn => 'array_agg_combine',
aggserialfn => 'array_agg_serialize',
aggdeserialfn => 'array_agg_deserialize', aggfinalextra => 't',
- aggtranstype => 'internal' },
+ aggtranstype => 'internal', aggtransspace => '-1' },
{ aggfnoid => 'array_agg(anyarray)', aggtransfn => 'array_agg_array_transfn',
aggfinalfn => 'array_agg_array_finalfn',
aggcombinefn => 'array_agg_array_combine',
aggserialfn => 'array_agg_array_serialize',
aggdeserialfn => 'array_agg_array_deserialize', aggfinalextra => 't',
- aggtranstype => 'internal' },
+ aggtranstype => 'internal', aggtransspace => '-1' },
# text
{ aggfnoid => 'string_agg(text,text)', aggtransfn => 'string_agg_transfn',
aggfinalfn => 'string_agg_finalfn', aggcombinefn => 'string_agg_combine',
aggserialfn => 'string_agg_serialize',
- aggdeserialfn => 'string_agg_deserialize', aggtranstype => 'internal' },
+ aggdeserialfn => 'string_agg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '-1' },
# bytea
{ aggfnoid => 'string_agg(bytea,bytea)',
aggtransfn => 'bytea_string_agg_transfn',
aggfinalfn => 'bytea_string_agg_finalfn',
aggcombinefn => 'string_agg_combine', aggserialfn => 'string_agg_serialize',
- aggdeserialfn => 'string_agg_deserialize', aggtranstype => 'internal' },
+ aggdeserialfn => 'string_agg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '-1' },
# range
{ aggfnoid => 'range_intersect_agg(anyrange)',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 20bf9ea9cdf..a357e1d0c0e 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -1470,7 +1470,7 @@ WHERE aggfnoid = 0 OR aggtransfn = 0 OR
(aggkind = 'n' AND aggnumdirectargs > 0) OR
aggfinalmodify NOT IN ('r', 's', 'w') OR
aggmfinalmodify NOT IN ('r', 's', 'w') OR
- aggtranstype = 0 OR aggtransspace < 0 OR aggmtransspace < 0;
+ aggtranstype = 0 OR aggmtransspace < 0;
ctid | aggfnoid
------+----------
(0 rows)
diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql
index 2fb3a852878..cd674d7dbca 100644
--- a/src/test/regress/sql/opr_sanity.sql
+++ b/src/test/regress/sql/opr_sanity.sql
@@ -847,7 +847,7 @@ WHERE aggfnoid = 0 OR aggtransfn = 0 OR
(aggkind = 'n' AND aggnumdirectargs > 0) OR
aggfinalmodify NOT IN ('r', 's', 'w') OR
aggmfinalmodify NOT IN ('r', 's', 'w') OR
- aggtranstype = 0 OR aggtransspace < 0 OR aggmtransspace < 0;
+ aggtranstype = 0 OR aggmtransspace < 0;
-- Make sure the matching pg_proc entry is sensible, too.
--
2.39.5 (Apple Git-154)
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-09-29 02:09 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 2 replies; 55+ messages in thread
From: Richard Guo @ 2025-09-29 02:09 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Thu, Sep 25, 2025 at 1:23 PM Richard Guo <[email protected]> wrote:
> Attached is an updated version of the patch with these optimizations
> applied.
FWIW, I plan to do another self-review of this patch soon, with the
goal of assessing whether it's ready to be pushed. If anyone has any
concerns about any part of the patch or would like to review it, I
would greatly appreciate hearing from you.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-01 23:54 Matheus Alcantara <[email protected]>
parent: Richard Guo <[email protected]>
1 sibling, 1 reply; 55+ messages in thread
From: Matheus Alcantara @ 2025-10-01 23:54 UTC (permalink / raw)
To: Richard Guo <[email protected]>; Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
[ getting back to testing this patch ...]
On my last email you replied:
>> Debugging this query shows that all if conditions on
>> setup_eager_aggregation() returns false and create_agg_clause_infos()
>> and create_grouping_expr_infos() are called. The RelAggInfo->agg_useful
>> is also being set to true so I would expect to see Finalize and Partial
>> agg nodes, is this correct or am I missing something here?
>
> Well, just because eager aggregation *can* be applied does not mean
> that it *will* be; it depends on whether it produces a lower-cost
> execution plan. This transformation is cost-based, so it's not the
> right mindset to assume that it will always be applied when possible.
>
Sorry for the noise here. I didn't consider the costs.
On Sun Sep 28, 2025 at 11:09 PM -03, Richard Guo wrote:
> On Thu, Sep 25, 2025 at 1:23 PM Richard Guo <[email protected]> wrote:
>> Attached is an updated version of the patch with these optimizations
>> applied.
>
> FWIW, I plan to do another self-review of this patch soon, with the
> goal of assessing whether it's ready to be pushed. If anyone has any
> concerns about any part of the patch or would like to review it, I
> would greatly appreciate hearing from you.
>
I spent some time testing patch v23 using the TPC-DS benchmark and am
seeing worse execution times when using eager aggregation.
The most interesting cases are:
Query | planning time | execution time |
query 31 | -2.03% │ -99.56% │
query 71 | -15.51% │ -68.88% │
query 20 | -10.77% │ -32.40% │
query 26 | -28.01% │ -32.35% │
query 85 | -10.57% │ -31.91% │
query 77 | -30.07% │ -31.38% │
query 69 | -32.79% │ -29.21% │
query 32 | -68.48% │ -27.89% │
query 57 | -7.99% │ -27.32% │
query 91 | -24.81% │ -26.20% │
query 23 | -11.72% │ -18.24% │
The query 31 seems bad, I don't know if I'm doing something completely
wrong but I've just setup a TPC-DS database and then executed the query
on master and with the v23 patch and I got these results:
Master:
Planning Time: 3.191 ms
Execution Time: 16950.619 ms
Patch:
Planning Time: 3.257 ms
Execution Time: 3848355.646 ms
Note that I've executed ANALYZE before running the queries on both
scenarios (master and patched).
I'm attaching an EXPLAIN(ANALYZE) output for the query 31 from master
and with the patch applied.
Please let me know if there is any other test that I can run to
benchmark this patch.
--
Matheus Alcantara
│ Sort (cost=656889.77..656889.77 rows=1 width=210) (actual time=17164.506..17164.519 rows=43.00 loops=1) │
│ Sort Key: ((ss3.store_sales / ss2.store_sales)) │
│ Sort Method: quicksort Memory: 28kB │
│ Buffers: shared hit=6533 read=69203, temp read=4343 written=12055 │
│ CTE ss │
│ -> HashAggregate (cost=323021.86..377372.99 rows=1476800 width=54) (actual time=3389.564..3677.220 rows=35136.00 loops=1) │
│ Group Key: customer_address.ca_county, date_dim.d_qoy, date_dim.d_year │
│ Planned Partitions: 64 Batches: 65 Memory Usage: 8209kB Disk Usage: 56840kB │
│ Buffers: shared hit=3408 read=50944, temp read=3962 written=10947 │
│ -> Hash Join (cost=5328.60..100701.93 rows=2625180 width=28) (actual time=46.394..2034.907 rows=2685273.00 loops=1) │
│ Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk) │
│ Buffers: shared hit=3408 read=50944 │
│ -> Hash Join (cost=2261.00..90416.35 rows=2749551 width=24) (actual time=18.753..1396.048 rows=2750429.00 loops=1) │
│ Hash Cond: (store_sales.ss_addr_sk = customer_address.ca_address_sk) │
│ Buffers: shared hit=1984 read=50944 │
│ -> Seq Scan on store_sales (cost=0.00..80594.17 rows=2880217 width=14) (actual time=0.063..228.063 rows=2880404.00 loops=1) │
│ Buffers: shared hit=848 read=50944 │
│ -> Hash (cost=1636.00..1636.00 rows=50000 width=18) (actual time=18.651..18.651 rows=50000.00 loops=1) │
│ Buckets: 65536 Batches: 1 Memory Usage: 3052kB │
│ Buffers: shared hit=1136 │
│ -> Seq Scan on customer_address (cost=0.00..1636.00 rows=50000 width=18) (actual time=0.005..9.555 rows=50000.00 loops=1) │
│ Buffers: shared hit=1136
│ -> Hash (cost=2154.49..2154.49 rows=73049 width=12) (actual time=27.627..27.629 rows=73049.00 loops=1) │
│ Buckets: 131072 Batches: 1 Memory Usage: 4163kB │
│ Buffers: shared hit=1424 │
│ -> Seq Scan on date_dim (cost=0.00..2154.49 rows=73049 width=12) (actual time=0.009..15.154 rows=73049.00 loops=1) │
│ Buffers: shared hit=1424 │
│ CTE ws │
│ -> HashAggregate (cost=96009.03..114825.35 rows=718952 width=54) (actual time=977.215..1014.889 rows=23320.00 loops=1) │
│ Group Key: customer_address_1.ca_county, date_dim_1.d_qoy, date_dim_1.d_year │
│ Planned Partitions: 32 Batches: 33 Memory Usage: 8209kB Disk Usage: 6032kB │
│ Buffers: shared hit=3125 read=18259, temp read=381 written=1108 │
│ -> Hash Join (cost=5328.60..35122.78 rows=718952 width=28) (actual time=46.623..611.054 rows=719118.00 loops=1) │
│ Hash Cond: (web_sales.ws_bill_addr_sk = customer_address_1.ca_address_sk)
│ Buffers: shared hit=3125 read=18259 │
│ -> Hash Join (cost=3067.60..30973.94 rows=719120 width=18) (actual time=27.691..424.273 rows=719195.00 loops=1) │
│ Hash Cond: (web_sales.ws_sold_date_sk = date_dim_1.d_date_sk) │
│ Buffers: shared hit=1989 read=18259 │
│ -> Seq Scan on web_sales (cost=0.00..26017.84 rows=719384 width=14) (actual time=0.082..63.389 rows=719384.00 loops=1) │
│ Buffers: shared hit=565 read=18259 │
│ -> Hash (cost=2154.49..2154.49 rows=73049 width=12) (actual time=27.538..27.538 rows=73049.00 loops=1) │
│ Buckets: 131072 Batches: 1 Memory Usage: 4163kB │
│ Buffers: shared hit=1424 │
│ -> Seq Scan on date_dim date_dim_1 (cost=0.00..2154.49 rows=73049 width=12) (actual time=0.006..14.914 rows=73049.00 loops=1) │
│ Buffers: shared hit=1424 │
│ -> Hash (cost=1636.00..1636.00 rows=50000 width=18) (actual time=18.902..18.902 rows=50000.00 loops=1) │
│ Buckets: 65536 Batches: 1 Memory Usage: 3052kB │
│ Buffers: shared hit=1136 │
│ -> Seq Scan on customer_address customer_address_1 (cost=0.00..1636.00 rows=50000 width=18) (actual time=0.008..9.727 rows=50000.00 loops=1) │
│ Buffers: shared hit=1136 │
│ -> Nested Loop (cost=0.00..164691.41 rows=1 width=210) (actual time=4817.695..17164.430 rows=43.00 loops=1) │
│ Join Filter: (((ss1.ca_county)::text = (ws2.ca_county)::text) AND (CASE WHEN (ws1.web_sales > '0'::numeric) THEN (ws2.web_sales / ws1.web_sales) ELSE NULL::numeric END > CASE WHEN (ss1.store_sales > '0'::numeric) THEN (ss2.store_sales / ss1.store_sales) ELSE NULL::numeric END) AND (CASE WHEN (ws2.web_sales > '0'::numeric) THEN (ws3.web_sales / ws2.web_sales) ELSE NULL::numeric END > CASE WHEN (ss2.store_sales > '0'::numeric) THEN (ss3.store_sales / ss2.store_sales) ELSE NULL::numeric END)) │
│ Rows Removed by Join Filter: 527207 │
│ Buffers: shared hit=6533 read=69203, temp read=4343 written=12055 │
│ -> Nested Loop (cost=0.00..146716.93 rows=1 width=554) (actual time=4671.968..15501.760 rows=570.00 loops=1) │
│ Join Filter: ((ss1.ca_county)::text = (ss3.ca_county)::text) │
│ Rows Removed by Join Filter: 1038674 │
│ Buffers: shared hit=6533 read=69203, temp read=4343 written=12055 │
│ -> Nested Loop (cost=0.00..109796.47 rows=1 width=444) (actual time=4669.164..12922.095 rows=578.00 loops=1)
│ Join Filter: ((ss1.ca_county)::text = (ss2.ca_county)::text) │
│ Rows Removed by Join Filter: 1008217 │
│ Buffers: shared hit=6533 read=69203, temp read=3559 written=12055 │
│ -> Nested Loop (cost=0.00..72876.00 rows=1 width=334) (actual time=4666.835..10231.481 rows=617.00 loops=1) │
│ Join Filter: ((ss1.ca_county)::text = (ws1.ca_county)::text) │
│ Rows Removed by Join Filter: 1089697 │
│ Buffers: shared hit=6533 read=69203, temp read=3559 written=12055 │
│ -> Nested Loop (cost=0.00..35954.71 rows=2 width=220) (actual time=1031.594..3687.112 rows=662.00 loops=1) │
│ Join Filter: ((ws1.ca_county)::text = (ws3.ca_county)::text) │
│ Rows Removed by Join Filter: 1148109 │
│ Buffers: shared hit=3125 read=18259, temp read=381 written=1108 │
│ -> CTE Scan on ws ws1 (cost=0.00..17973.80 rows=18 width=110) (actual time=977.224..980.082 rows=911.00 loops=1)
│ Filter: ((d_qoy = 1) AND (d_year = 1999)) │
│ Rows Removed by Filter: 22409 │
│ Storage: Memory Maximum Storage: 1700kB │
│ Buffers: shared hit=3125 read=18259, temp written=1107 │
│ -> CTE Scan on ws ws3 (cost=0.00..17973.80 rows=18 width=110) (actual time=0.005..2.857 rows=1261.00 loops=911) │
│ Filter: ((d_year = 1999) AND (d_qoy = 3)) │
│ Rows Removed by Filter: 22059 │
│ Storage: Memory Maximum Storage: 1700kB │
│ Buffers: temp read=381 written=1 │
│ -> CTE Scan on ss ss1 (cost=0.00..36920.00 rows=37 width=114) (actual time=5.121..9.740 rows=1647.00 loops=662) │
│ Filter: ((d_qoy = 1) AND (d_year = 1999)) │
│ Rows Removed by Filter: 33489 │
│ Storage: Memory Maximum Storage: 2636kB │
│ Buffers: shared hit=3408 read=50944, temp read=3178 written=10947 │
│ -> CTE Scan on ss ss2 (cost=0.00..36920.00 rows=37 width=110) (actual time=0.001..4.216 rows=1635.00 loops=617) │
│ Filter: ((d_year = 1999) AND (d_qoy = 2)) │
│ Rows Removed by Filter: 33501 │
│ Storage: Memory Maximum Storage: 2636kB │
│ -> CTE Scan on ss ss3 (cost=0.00..36920.00 rows=37 width=110) (actual time=0.006..4.305 rows=1798.00 loops=578) │
│ Filter: ((d_year = 1999) AND (d_qoy = 3)) │
│ Rows Removed by Filter: 33338 │
│ Storage: Memory Maximum Storage: 2636kB │
│ Buffers: temp read=784 │
│ -> CTE Scan on ws ws2 (cost=0.00..17973.80 rows=18 width=110) (actual time=0.001..2.810 rows=925.00 loops=570) │
│ Filter: ((d_year = 1999) AND (d_qoy = 2))
│ Rows Removed by Filter: 22395 │
│ Storage: Memory Maximum Storage: 1700kB │
│ Planning: │
│ Buffers: shared hit=12 │
│ Planning Time: 2.180 ms │
│ Execution Time: 17166.558 ms │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
│ Sort (cost=302668.66..302668.66 rows=1 width=210) (actual time=3825537.172..3825541.540 rows=43.00 loops=1) │
│ Sort Key: ((ss3.store_sales / ss2.store_sales)) │
│ Sort Method: quicksort Memory: 28kB │
│ Buffers: shared hit=21757 read=69012, temp read=14486 written=25552 │
│ CTE ss │
│ -> Finalize GroupAggregate (cost=178135.51..215272.86 rows=262517 width=54) (actual time=1471.638..1733.635 rows=35117.00 loops=1) │
│ Group Key: customer_address.ca_county, date_dim.d_qoy, date_dim.d_year │
│ Buffers: shared hit=3533 read=50849, temp read=14486 written=25552 │
│ -> Gather Merge (cost=178135.51..208709.94 rows=262517 width=54) (actual time=1471.627..1586.417 rows=234867.00 loops=1) │
│ Workers Planned: 2 │
│ Workers Launched: 2 │
│ Buffers: shared hit=3533 read=50849, temp read=14486 written=25552 │
│ -> Sort (cost=177135.48..177408.94 rows=109382 width=54) (actual time=1463.292..1497.110 rows=78658.67 loops=3) │
│ Sort Key: customer_address.ca_county, date_dim.d_qoy, date_dim.d_year │
│ Sort Method: external merge Disk: 7944kB │
│ Buffers: shared hit=3533 read=50849, temp read=14486 written=25552 │
│ Worker 0: Sort Method: external merge Disk: 8000kB │
│ Worker 1: Sort Method: external merge Disk: 7928kB │
│ -> Parallel Hash Join (cost=147862.49..164239.25 rows=109382 width=54) (actual time=839.965..1235.101 rows=80523.33 loops=3) │
│ Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk) │
│ Buffers: shared hit=3503 read=50849, temp read=11502 written=22562 │
│ -> Parallel Hash Join (cost=145471.66..161547.68 rows=114565 width=50) (actual time=820.740..1192.922 rows=96392.00 loops=3) │
│ Hash Cond: (store_sales.ss_addr_sk = customer_address.ca_address_sk) │
│ Buffers: shared hit=2079 read=50849, temp read=11502 written=22562 │
│ -> Partial HashAggregate (cost=143673.89..158993.80 rows=288022 width=40) (actual time=810.581..1155.245 rows=98213.67 loops=3) │
│ Group Key: store_sales.ss_sold_date_sk, store_sales.ss_addr_sk │
│ Planned Partitions: 16 Batches: 17 Memory Usage: 8337kB Disk Usage: 31640kB │
│ Buffers: shared hit=943 read=50849, temp read=11502 written=22562 │
│ Worker 0: Batches: 17 Memory Usage: 8337kB Disk Usage: 31760kB │
│ Worker 1: Batches: 17 Memory Usage: 8337kB Disk Usage: 31640kB │
│ -> Parallel Seq Scan on store_sales (cost=0.00..63792.90 rows=1200090 width=14) (actual time=0.126..79.442 rows=960134.67 loops=3) │
│ Buffers: shared hit=943 read=50849 │
│ -> Parallel Hash (cost=1430.12..1430.12 rows=29412 width=18) (actual time=10.036..10.038 rows=16666.67 loops=3) │
│ Buckets: 65536 Batches: 1 Memory Usage: 3264kB │
│ Buffers: shared hit=1136 │
│ -> Parallel Seq Scan on customer_address (cost=0.00..1430.12 rows=29412 width=18) (actual time=0.007..5.102 rows=16666.67 loops=3) │
│ Buffers: shared hit=1136 │
│ -> Parallel Hash (cost=1853.70..1853.70 rows=42970 width=12) (actual time=19.092..19.094 rows=24349.67 loops=3) │
│ Buckets: 131072 Batches: 1 Memory Usage: 4512kB │
│ Buffers: shared hit=1424 │
│ -> Parallel Seq Scan on date_dim (cost=0.00..1853.70 rows=42970 width=12) (actual time=0.012..10.264 rows=24349.67 loops=3) │
│ Buffers: shared hit=1424 │
│ CTE ws │
│ -> Finalize GroupAggregate (cost=52144.19..62314.79 rows=71894 width=54) (actual time=275.121..340.107 rows=23312.00 loops=1) │
│ Group Key: customer_address_1.ca_county, date_dim_1.d_qoy, date_dim_1.d_year │
│ Buffers: shared hit=18224 read=18163 │
│ -> Gather Merge (cost=52144.19..60517.44 rows=71894 width=54) (actual time=275.107..297.072 rows=60190.00 loops=1) │
│ Workers Planned: 2 │
│ Workers Launched: 2 │
│ Buffers: shared hit=18224 read=18163 │
│ -> Sort (cost=51144.17..51219.06 rows=29956 width=54) (actual time=271.870..272.906 rows=20293.33 loops=3) │
│ Sort Key: customer_address_1.ca_county, date_dim_1.d_qoy, date_dim_1.d_year │
│ Sort Method: quicksort Memory: 2931kB │
│ Buffers: shared hit=18224 read=18163 │
│ Worker 0: Sort Method: quicksort Memory: 2938kB │
│ Worker 1: Sort Method: quicksort Memory: 2955kB │
│ -> Nested Loop (cost=43571.15..48916.86 rows=29956 width=54) (actual time=184.657..215.740 rows=20419.67 loops=3) │
│ Buffers: shared hit=18194 read=18163 │
│ -> Parallel Hash Join (cost=43570.84..47586.10 rows=29967 width=50) (actual time=184.630..201.358 rows=20451.00 loops=3) │
│ Hash Cond: (web_sales.ws_bill_addr_sk = customer_address_1.ca_address_sk) │
│ Buffers: shared hit=1797 read=18163 │
│ -> Partial HashAggregate (cost=41773.08..45599.48 rows=71938 width=40) (actual time=177.706..188.464 rows=20477.33 loops=3) │
│ Group Key: web_sales.ws_sold_date_sk, web_sales.ws_bill_addr_sk │
│ Planned Partitions: 4 Batches: 1 Memory Usage: 7953kB │
│ Buffers: shared hit=661 read=18163 │
│ Worker 0: Batches: 1 Memory Usage: 7953kB │
│ Worker 1: Batches: 1 Memory Usage: 7953kB │
│ -> Parallel Seq Scan on web_sales (cost=0.00..21821.43 rows=299743 width=14) (actual time=0.106..23.122 rows=239794.67 loops=3) │
│ Buffers: shared hit=661 read=18163 │
│ -> Parallel Hash (cost=1430.12..1430.12 rows=29412 width=18) (actual time=6.846..6.847 rows=16666.67 loops=3) │
│ Buckets: 65536 Batches: 1 Memory Usage: 3264kB │
│ Buffers: shared hit=1136 │
│ -> Parallel Seq Scan on customer_address customer_address_1 (cost=0.00..1430.12 rows=29412 width=18) (actual time=0.008..3.586 rows=16666.67 loops=3) │
│ Buffers: shared hit=1136 │
│ -> Memoize (cost=0.30..0.33 rows=1 width=12) (actual time=0.000..0.000 rows=1.00 loops=61353) │
│ Cache Key: web_sales.ws_sold_date_sk │
│ Cache Mode: logical │
│ Estimates: capacity=1822 distinct keys=1822 lookups=29967 hit percent=93.92% │
│ Hits: 18542 Misses: 1824 Evictions: 0 Overflows: 0 Memory Usage: 200kB │
│ Buffers: shared hit=16397 │
│ Worker 0: Hits: 18589 Misses: 1821 Evictions: 0 Overflows: 0 Memory Usage: 200kB │
│ Worker 1: Hits: 18754 Misses: 1823 Evictions: 0 Overflows: 0 Memory Usage: 200kB │
│ -> Index Scan using date_dim_pkey on date_dim date_dim_1 (cost=0.29..0.32 rows=1 width=12) (actual time=0.002..0.002 rows=1.00 loops=5468) │
│ Index Cond: (d_date_sk = web_sales.ws_sold_date_sk) │
│ Index Searches: 5465 │
│ Buffers: shared hit=16397 │
│ -> Nested Loop (cost=0.00..25081.00 rows=1 width=210) (actual time=43808.287..3825536.966 rows=43.00 loops=1) │
│ Join Filter: (((ss1.ca_county)::text = (ss2.ca_county)::text) AND (CASE WHEN (ws1.web_sales > '0'::numeric) THEN (ws2.web_sales / ws1.web_sales) ELSE NULL::numeric END > CASE WHEN (ss1.store_sales > '0'::numeric) THEN (ss2.store_sales / ss1.store_sales) ELSE NULL::numeric END)) │
│ Rows Removed by Join Filter: 226832 │
│ Buffers: shared hit=7500 read=22936, temp read=4819 written=8505 │
│ -> Merge Join (cost=0.00..8360.31 rows=1 width=224) (actual time=1747.759..1760.887 rows=825.00 loops=1) │
│ Merge Cond: ((ss1.ca_county)::text = (ws1.ca_county)::text) │
│ Buffers: shared hit=7500 read=22936, temp read=4321 written=8505 │
│ -> CTE Scan on ss ss1 (cost=0.00..6562.93 rows=7 width=114) (actual time=1471.648..1477.297 rows=1647.00 loops=1) │
│ Filter: ((d_qoy = 1) AND (d_year = 1999)) │
│ Rows Removed by Filter: 33470 │
│ Storage: Memory Maximum Storage: 2635kB │
│ Buffers: shared hit=1278 read=16903, temp read=4321 written=8505 │
│ -> Materialize (cost=0.00..1797.36 rows=2 width=110) (actual time=275.335..280.952 rows=911.00 loops=1) │
│ Storage: Memory Maximum Storage: 17kB │
│ Buffers: shared hit=6222 read=6033 │
│ -> CTE Scan on ws ws1 (cost=0.00..1797.35 rows=2 width=110) (actual time=275.333..279.774 rows=911.00 loops=1) │
│ Filter: ((d_qoy = 1) AND (d_year = 1999)) │
│ Rows Removed by Filter: 22390 │
│ Storage: Memory Maximum Storage: 1700kB │
│ Buffers: shared hit=6222 read=6033 │
│ -> Nested Loop (cost=0.00..16720.65 rows=1 width=440) (actual time=5.913..4634.838 rows=275.00 loops=825) │
│ Join Filter: (((ss2.ca_county)::text = (ss3.ca_county)::text) AND (CASE WHEN (ws2.web_sales > '0'::numeric) THEN (ws3.web_sales / ws2.web_sales) ELSE NULL::numeric END > CASE WHEN (ss2.store_sales > '0'::numeric) THEN (ss3.store_sales / ss2.store_sales) ELSE NULL::numeric END)) │
│ Rows Removed by Join Filter: 1037001 │
│ Buffers: temp read=498 │
│ -> Merge Join (cost=0.00..8360.31 rows=1 width=220) (actual time=0.001..5.266 rows=844.00 loops=825) │
│ Merge Cond: ((ss2.ca_county)::text = (ws2.ca_county)::text) │
│ -> CTE Scan on ss ss2 (cost=0.00..6562.93 rows=7 width=110) (actual time=0.001..4.131 rows=1634.00 loops=825) │
│ Filter: ((d_year = 1999) AND (d_qoy = 2)) │
│ Rows Removed by Filter: 33468 │
│ Storage: Memory Maximum Storage: 2635kB │
│ -> Materialize (cost=0.00..1797.36 rows=2 width=110) (actual time=0.000..0.053 rows=925.00 loops=825) │
│ Storage: Memory Maximum Storage: 74kB │
│ -> CTE Scan on ws ws2 (cost=0.00..1797.35 rows=2 width=110) (actual time=0.001..2.784 rows=925.00 loops=1) │
│ Filter: ((d_year = 1999) AND (d_qoy = 2)) │
│ Rows Removed by Filter: 22382 │
│ Storage: Memory Maximum Storage: 1700kB │
│ -> Merge Join (cost=0.00..8360.31 rows=1 width=220) (actual time=0.002..5.383 rows=1229.00 loops=696300) │
│ Merge Cond: ((ss3.ca_county)::text = (ws3.ca_county)::text) │
│ Buffers: temp read=498 │
│ -> CTE Scan on ss ss3 (cost=0.00..6562.93 rows=7 width=110) (actual time=0.001..4.051 rows=1796.00 loops=696300) │
│ Filter: ((d_year = 1999) AND (d_qoy = 3)) │
│ Rows Removed by Filter: 33292 │
│ Storage: Memory Maximum Storage: 2635kB │
│ Buffers: temp read=498 │
│ -> Materialize (cost=0.00..1797.36 rows=2 width=110) (actual time=0.000..0.047 rows=1261.00 loops=696300) │
│ Storage: Memory Maximum Storage: 95kB │
│ -> CTE Scan on ws ws3 (cost=0.00..1797.35 rows=2 width=110) (actual time=0.001..74.725 rows=1261.00 loops=1) │
│ Filter: ((d_year = 1999) AND (d_qoy = 3)) │
│ Rows Removed by Filter: 22051 │
│ Storage: Memory Maximum Storage: 1700kB │
│ Planning: │
│ Buffers: shared hit=12 │
│ Planning Time: 4.951 ms │
│ Execution Time: 3825542.556 ms │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Attachments:
[text/plain] query-31.master.explain (50.6K, 2-query-31.master.explain)
download | inline:
│ Sort (cost=656889.77..656889.77 rows=1 width=210) (actual time=17164.506..17164.519 rows=43.00 loops=1) │
│ Sort Key: ((ss3.store_sales / ss2.store_sales)) │
│ Sort Method: quicksort Memory: 28kB │
│ Buffers: shared hit=6533 read=69203, temp read=4343 written=12055 │
│ CTE ss │
│ -> HashAggregate (cost=323021.86..377372.99 rows=1476800 width=54) (actual time=3389.564..3677.220 rows=35136.00 loops=1) │
│ Group Key: customer_address.ca_county, date_dim.d_qoy, date_dim.d_year │
│ Planned Partitions: 64 Batches: 65 Memory Usage: 8209kB Disk Usage: 56840kB │
│ Buffers: shared hit=3408 read=50944, temp read=3962 written=10947 │
│ -> Hash Join (cost=5328.60..100701.93 rows=2625180 width=28) (actual time=46.394..2034.907 rows=2685273.00 loops=1) │
│ Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk) │
│ Buffers: shared hit=3408 read=50944 │
│ -> Hash Join (cost=2261.00..90416.35 rows=2749551 width=24) (actual time=18.753..1396.048 rows=2750429.00 loops=1) │
│ Hash Cond: (store_sales.ss_addr_sk = customer_address.ca_address_sk) │
│ Buffers: shared hit=1984 read=50944 │
│ -> Seq Scan on store_sales (cost=0.00..80594.17 rows=2880217 width=14) (actual time=0.063..228.063 rows=2880404.00 loops=1) │
│ Buffers: shared hit=848 read=50944 │
│ -> Hash (cost=1636.00..1636.00 rows=50000 width=18) (actual time=18.651..18.651 rows=50000.00 loops=1) │
│ Buckets: 65536 Batches: 1 Memory Usage: 3052kB │
│ Buffers: shared hit=1136 │
│ -> Seq Scan on customer_address (cost=0.00..1636.00 rows=50000 width=18) (actual time=0.005..9.555 rows=50000.00 loops=1) │
│ Buffers: shared hit=1136
│ -> Hash (cost=2154.49..2154.49 rows=73049 width=12) (actual time=27.627..27.629 rows=73049.00 loops=1) │
│ Buckets: 131072 Batches: 1 Memory Usage: 4163kB │
│ Buffers: shared hit=1424 │
│ -> Seq Scan on date_dim (cost=0.00..2154.49 rows=73049 width=12) (actual time=0.009..15.154 rows=73049.00 loops=1) │
│ Buffers: shared hit=1424 │
│ CTE ws │
│ -> HashAggregate (cost=96009.03..114825.35 rows=718952 width=54) (actual time=977.215..1014.889 rows=23320.00 loops=1) │
│ Group Key: customer_address_1.ca_county, date_dim_1.d_qoy, date_dim_1.d_year │
│ Planned Partitions: 32 Batches: 33 Memory Usage: 8209kB Disk Usage: 6032kB │
│ Buffers: shared hit=3125 read=18259, temp read=381 written=1108 │
│ -> Hash Join (cost=5328.60..35122.78 rows=718952 width=28) (actual time=46.623..611.054 rows=719118.00 loops=1) │
│ Hash Cond: (web_sales.ws_bill_addr_sk = customer_address_1.ca_address_sk)
│ Buffers: shared hit=3125 read=18259 │
│ -> Hash Join (cost=3067.60..30973.94 rows=719120 width=18) (actual time=27.691..424.273 rows=719195.00 loops=1) │
│ Hash Cond: (web_sales.ws_sold_date_sk = date_dim_1.d_date_sk) │
│ Buffers: shared hit=1989 read=18259 │
│ -> Seq Scan on web_sales (cost=0.00..26017.84 rows=719384 width=14) (actual time=0.082..63.389 rows=719384.00 loops=1) │
│ Buffers: shared hit=565 read=18259 │
│ -> Hash (cost=2154.49..2154.49 rows=73049 width=12) (actual time=27.538..27.538 rows=73049.00 loops=1) │
│ Buckets: 131072 Batches: 1 Memory Usage: 4163kB │
│ Buffers: shared hit=1424 │
│ -> Seq Scan on date_dim date_dim_1 (cost=0.00..2154.49 rows=73049 width=12) (actual time=0.006..14.914 rows=73049.00 loops=1) │
│ Buffers: shared hit=1424 │
│ -> Hash (cost=1636.00..1636.00 rows=50000 width=18) (actual time=18.902..18.902 rows=50000.00 loops=1) │
│ Buckets: 65536 Batches: 1 Memory Usage: 3052kB │
│ Buffers: shared hit=1136 │
│ -> Seq Scan on customer_address customer_address_1 (cost=0.00..1636.00 rows=50000 width=18) (actual time=0.008..9.727 rows=50000.00 loops=1) │
│ Buffers: shared hit=1136 │
│ -> Nested Loop (cost=0.00..164691.41 rows=1 width=210) (actual time=4817.695..17164.430 rows=43.00 loops=1) │
│ Join Filter: (((ss1.ca_county)::text = (ws2.ca_county)::text) AND (CASE WHEN (ws1.web_sales > '0'::numeric) THEN (ws2.web_sales / ws1.web_sales) ELSE NULL::numeric END > CASE WHEN (ss1.store_sales > '0'::numeric) THEN (ss2.store_sales / ss1.store_sales) ELSE NULL::numeric END) AND (CASE WHEN (ws2.web_sales > '0'::numeric) THEN (ws3.web_sales / ws2.web_sales) ELSE NULL::numeric END > CASE WHEN (ss2.store_sales > '0'::numeric) THEN (ss3.store_sales / ss2.store_sales) ELSE NULL::numeric END)) │
│ Rows Removed by Join Filter: 527207 │
│ Buffers: shared hit=6533 read=69203, temp read=4343 written=12055 │
│ -> Nested Loop (cost=0.00..146716.93 rows=1 width=554) (actual time=4671.968..15501.760 rows=570.00 loops=1) │
│ Join Filter: ((ss1.ca_county)::text = (ss3.ca_county)::text) │
│ Rows Removed by Join Filter: 1038674 │
│ Buffers: shared hit=6533 read=69203, temp read=4343 written=12055 │
│ -> Nested Loop (cost=0.00..109796.47 rows=1 width=444) (actual time=4669.164..12922.095 rows=578.00 loops=1)
│ Join Filter: ((ss1.ca_county)::text = (ss2.ca_county)::text) │
│ Rows Removed by Join Filter: 1008217 │
│ Buffers: shared hit=6533 read=69203, temp read=3559 written=12055 │
│ -> Nested Loop (cost=0.00..72876.00 rows=1 width=334) (actual time=4666.835..10231.481 rows=617.00 loops=1) │
│ Join Filter: ((ss1.ca_county)::text = (ws1.ca_county)::text) │
│ Rows Removed by Join Filter: 1089697 │
│ Buffers: shared hit=6533 read=69203, temp read=3559 written=12055 │
│ -> Nested Loop (cost=0.00..35954.71 rows=2 width=220) (actual time=1031.594..3687.112 rows=662.00 loops=1) │
│ Join Filter: ((ws1.ca_county)::text = (ws3.ca_county)::text) │
│ Rows Removed by Join Filter: 1148109 │
│ Buffers: shared hit=3125 read=18259, temp read=381 written=1108 │
│ -> CTE Scan on ws ws1 (cost=0.00..17973.80 rows=18 width=110) (actual time=977.224..980.082 rows=911.00 loops=1)
│ Filter: ((d_qoy = 1) AND (d_year = 1999)) │
│ Rows Removed by Filter: 22409 │
│ Storage: Memory Maximum Storage: 1700kB │
│ Buffers: shared hit=3125 read=18259, temp written=1107 │
│ -> CTE Scan on ws ws3 (cost=0.00..17973.80 rows=18 width=110) (actual time=0.005..2.857 rows=1261.00 loops=911) │
│ Filter: ((d_year = 1999) AND (d_qoy = 3)) │
│ Rows Removed by Filter: 22059 │
│ Storage: Memory Maximum Storage: 1700kB │
│ Buffers: temp read=381 written=1 │
│ -> CTE Scan on ss ss1 (cost=0.00..36920.00 rows=37 width=114) (actual time=5.121..9.740 rows=1647.00 loops=662) │
│ Filter: ((d_qoy = 1) AND (d_year = 1999)) │
│ Rows Removed by Filter: 33489 │
│ Storage: Memory Maximum Storage: 2636kB │
│ Buffers: shared hit=3408 read=50944, temp read=3178 written=10947 │
│ -> CTE Scan on ss ss2 (cost=0.00..36920.00 rows=37 width=110) (actual time=0.001..4.216 rows=1635.00 loops=617) │
│ Filter: ((d_year = 1999) AND (d_qoy = 2)) │
│ Rows Removed by Filter: 33501 │
│ Storage: Memory Maximum Storage: 2636kB │
│ -> CTE Scan on ss ss3 (cost=0.00..36920.00 rows=37 width=110) (actual time=0.006..4.305 rows=1798.00 loops=578) │
│ Filter: ((d_year = 1999) AND (d_qoy = 3)) │
│ Rows Removed by Filter: 33338 │
│ Storage: Memory Maximum Storage: 2636kB │
│ Buffers: temp read=784 │
│ -> CTE Scan on ws ws2 (cost=0.00..17973.80 rows=18 width=110) (actual time=0.001..2.810 rows=925.00 loops=570) │
│ Filter: ((d_year = 1999) AND (d_qoy = 2))
│ Rows Removed by Filter: 22395 │
│ Storage: Memory Maximum Storage: 1700kB │
│ Planning: │
│ Buffers: shared hit=12 │
│ Planning Time: 2.180 ms │
│ Execution Time: 17166.558 ms │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
[text/plain] query-31.patch.explain (42.0K, 3-query-31.patch.explain)
download | inline:
│ Sort (cost=302668.66..302668.66 rows=1 width=210) (actual time=3825537.172..3825541.540 rows=43.00 loops=1) │
│ Sort Key: ((ss3.store_sales / ss2.store_sales)) │
│ Sort Method: quicksort Memory: 28kB │
│ Buffers: shared hit=21757 read=69012, temp read=14486 written=25552 │
│ CTE ss │
│ -> Finalize GroupAggregate (cost=178135.51..215272.86 rows=262517 width=54) (actual time=1471.638..1733.635 rows=35117.00 loops=1) │
│ Group Key: customer_address.ca_county, date_dim.d_qoy, date_dim.d_year │
│ Buffers: shared hit=3533 read=50849, temp read=14486 written=25552 │
│ -> Gather Merge (cost=178135.51..208709.94 rows=262517 width=54) (actual time=1471.627..1586.417 rows=234867.00 loops=1) │
│ Workers Planned: 2 │
│ Workers Launched: 2 │
│ Buffers: shared hit=3533 read=50849, temp read=14486 written=25552 │
│ -> Sort (cost=177135.48..177408.94 rows=109382 width=54) (actual time=1463.292..1497.110 rows=78658.67 loops=3) │
│ Sort Key: customer_address.ca_county, date_dim.d_qoy, date_dim.d_year │
│ Sort Method: external merge Disk: 7944kB │
│ Buffers: shared hit=3533 read=50849, temp read=14486 written=25552 │
│ Worker 0: Sort Method: external merge Disk: 8000kB │
│ Worker 1: Sort Method: external merge Disk: 7928kB │
│ -> Parallel Hash Join (cost=147862.49..164239.25 rows=109382 width=54) (actual time=839.965..1235.101 rows=80523.33 loops=3) │
│ Hash Cond: (store_sales.ss_sold_date_sk = date_dim.d_date_sk) │
│ Buffers: shared hit=3503 read=50849, temp read=11502 written=22562 │
│ -> Parallel Hash Join (cost=145471.66..161547.68 rows=114565 width=50) (actual time=820.740..1192.922 rows=96392.00 loops=3) │
│ Hash Cond: (store_sales.ss_addr_sk = customer_address.ca_address_sk) │
│ Buffers: shared hit=2079 read=50849, temp read=11502 written=22562 │
│ -> Partial HashAggregate (cost=143673.89..158993.80 rows=288022 width=40) (actual time=810.581..1155.245 rows=98213.67 loops=3) │
│ Group Key: store_sales.ss_sold_date_sk, store_sales.ss_addr_sk │
│ Planned Partitions: 16 Batches: 17 Memory Usage: 8337kB Disk Usage: 31640kB │
│ Buffers: shared hit=943 read=50849, temp read=11502 written=22562 │
│ Worker 0: Batches: 17 Memory Usage: 8337kB Disk Usage: 31760kB │
│ Worker 1: Batches: 17 Memory Usage: 8337kB Disk Usage: 31640kB │
│ -> Parallel Seq Scan on store_sales (cost=0.00..63792.90 rows=1200090 width=14) (actual time=0.126..79.442 rows=960134.67 loops=3) │
│ Buffers: shared hit=943 read=50849 │
│ -> Parallel Hash (cost=1430.12..1430.12 rows=29412 width=18) (actual time=10.036..10.038 rows=16666.67 loops=3) │
│ Buckets: 65536 Batches: 1 Memory Usage: 3264kB │
│ Buffers: shared hit=1136 │
│ -> Parallel Seq Scan on customer_address (cost=0.00..1430.12 rows=29412 width=18) (actual time=0.007..5.102 rows=16666.67 loops=3) │
│ Buffers: shared hit=1136 │
│ -> Parallel Hash (cost=1853.70..1853.70 rows=42970 width=12) (actual time=19.092..19.094 rows=24349.67 loops=3) │
│ Buckets: 131072 Batches: 1 Memory Usage: 4512kB │
│ Buffers: shared hit=1424 │
│ -> Parallel Seq Scan on date_dim (cost=0.00..1853.70 rows=42970 width=12) (actual time=0.012..10.264 rows=24349.67 loops=3) │
│ Buffers: shared hit=1424 │
│ CTE ws │
│ -> Finalize GroupAggregate (cost=52144.19..62314.79 rows=71894 width=54) (actual time=275.121..340.107 rows=23312.00 loops=1) │
│ Group Key: customer_address_1.ca_county, date_dim_1.d_qoy, date_dim_1.d_year │
│ Buffers: shared hit=18224 read=18163 │
│ -> Gather Merge (cost=52144.19..60517.44 rows=71894 width=54) (actual time=275.107..297.072 rows=60190.00 loops=1) │
│ Workers Planned: 2 │
│ Workers Launched: 2 │
│ Buffers: shared hit=18224 read=18163 │
│ -> Sort (cost=51144.17..51219.06 rows=29956 width=54) (actual time=271.870..272.906 rows=20293.33 loops=3) │
│ Sort Key: customer_address_1.ca_county, date_dim_1.d_qoy, date_dim_1.d_year │
│ Sort Method: quicksort Memory: 2931kB │
│ Buffers: shared hit=18224 read=18163 │
│ Worker 0: Sort Method: quicksort Memory: 2938kB │
│ Worker 1: Sort Method: quicksort Memory: 2955kB │
│ -> Nested Loop (cost=43571.15..48916.86 rows=29956 width=54) (actual time=184.657..215.740 rows=20419.67 loops=3) │
│ Buffers: shared hit=18194 read=18163 │
│ -> Parallel Hash Join (cost=43570.84..47586.10 rows=29967 width=50) (actual time=184.630..201.358 rows=20451.00 loops=3) │
│ Hash Cond: (web_sales.ws_bill_addr_sk = customer_address_1.ca_address_sk) │
│ Buffers: shared hit=1797 read=18163 │
│ -> Partial HashAggregate (cost=41773.08..45599.48 rows=71938 width=40) (actual time=177.706..188.464 rows=20477.33 loops=3) │
│ Group Key: web_sales.ws_sold_date_sk, web_sales.ws_bill_addr_sk │
│ Planned Partitions: 4 Batches: 1 Memory Usage: 7953kB │
│ Buffers: shared hit=661 read=18163 │
│ Worker 0: Batches: 1 Memory Usage: 7953kB │
│ Worker 1: Batches: 1 Memory Usage: 7953kB │
│ -> Parallel Seq Scan on web_sales (cost=0.00..21821.43 rows=299743 width=14) (actual time=0.106..23.122 rows=239794.67 loops=3) │
│ Buffers: shared hit=661 read=18163 │
│ -> Parallel Hash (cost=1430.12..1430.12 rows=29412 width=18) (actual time=6.846..6.847 rows=16666.67 loops=3) │
│ Buckets: 65536 Batches: 1 Memory Usage: 3264kB │
│ Buffers: shared hit=1136 │
│ -> Parallel Seq Scan on customer_address customer_address_1 (cost=0.00..1430.12 rows=29412 width=18) (actual time=0.008..3.586 rows=16666.67 loops=3) │
│ Buffers: shared hit=1136 │
│ -> Memoize (cost=0.30..0.33 rows=1 width=12) (actual time=0.000..0.000 rows=1.00 loops=61353) │
│ Cache Key: web_sales.ws_sold_date_sk │
│ Cache Mode: logical │
│ Estimates: capacity=1822 distinct keys=1822 lookups=29967 hit percent=93.92% │
│ Hits: 18542 Misses: 1824 Evictions: 0 Overflows: 0 Memory Usage: 200kB │
│ Buffers: shared hit=16397 │
│ Worker 0: Hits: 18589 Misses: 1821 Evictions: 0 Overflows: 0 Memory Usage: 200kB │
│ Worker 1: Hits: 18754 Misses: 1823 Evictions: 0 Overflows: 0 Memory Usage: 200kB │
│ -> Index Scan using date_dim_pkey on date_dim date_dim_1 (cost=0.29..0.32 rows=1 width=12) (actual time=0.002..0.002 rows=1.00 loops=5468) │
│ Index Cond: (d_date_sk = web_sales.ws_sold_date_sk) │
│ Index Searches: 5465 │
│ Buffers: shared hit=16397 │
│ -> Nested Loop (cost=0.00..25081.00 rows=1 width=210) (actual time=43808.287..3825536.966 rows=43.00 loops=1) │
│ Join Filter: (((ss1.ca_county)::text = (ss2.ca_county)::text) AND (CASE WHEN (ws1.web_sales > '0'::numeric) THEN (ws2.web_sales / ws1.web_sales) ELSE NULL::numeric END > CASE WHEN (ss1.store_sales > '0'::numeric) THEN (ss2.store_sales / ss1.store_sales) ELSE NULL::numeric END)) │
│ Rows Removed by Join Filter: 226832 │
│ Buffers: shared hit=7500 read=22936, temp read=4819 written=8505 │
│ -> Merge Join (cost=0.00..8360.31 rows=1 width=224) (actual time=1747.759..1760.887 rows=825.00 loops=1) │
│ Merge Cond: ((ss1.ca_county)::text = (ws1.ca_county)::text) │
│ Buffers: shared hit=7500 read=22936, temp read=4321 written=8505 │
│ -> CTE Scan on ss ss1 (cost=0.00..6562.93 rows=7 width=114) (actual time=1471.648..1477.297 rows=1647.00 loops=1) │
│ Filter: ((d_qoy = 1) AND (d_year = 1999)) │
│ Rows Removed by Filter: 33470 │
│ Storage: Memory Maximum Storage: 2635kB │
│ Buffers: shared hit=1278 read=16903, temp read=4321 written=8505 │
│ -> Materialize (cost=0.00..1797.36 rows=2 width=110) (actual time=275.335..280.952 rows=911.00 loops=1) │
│ Storage: Memory Maximum Storage: 17kB │
│ Buffers: shared hit=6222 read=6033 │
│ -> CTE Scan on ws ws1 (cost=0.00..1797.35 rows=2 width=110) (actual time=275.333..279.774 rows=911.00 loops=1) │
│ Filter: ((d_qoy = 1) AND (d_year = 1999)) │
│ Rows Removed by Filter: 22390 │
│ Storage: Memory Maximum Storage: 1700kB │
│ Buffers: shared hit=6222 read=6033 │
│ -> Nested Loop (cost=0.00..16720.65 rows=1 width=440) (actual time=5.913..4634.838 rows=275.00 loops=825) │
│ Join Filter: (((ss2.ca_county)::text = (ss3.ca_county)::text) AND (CASE WHEN (ws2.web_sales > '0'::numeric) THEN (ws3.web_sales / ws2.web_sales) ELSE NULL::numeric END > CASE WHEN (ss2.store_sales > '0'::numeric) THEN (ss3.store_sales / ss2.store_sales) ELSE NULL::numeric END)) │
│ Rows Removed by Join Filter: 1037001 │
│ Buffers: temp read=498 │
│ -> Merge Join (cost=0.00..8360.31 rows=1 width=220) (actual time=0.001..5.266 rows=844.00 loops=825) │
│ Merge Cond: ((ss2.ca_county)::text = (ws2.ca_county)::text) │
│ -> CTE Scan on ss ss2 (cost=0.00..6562.93 rows=7 width=110) (actual time=0.001..4.131 rows=1634.00 loops=825) │
│ Filter: ((d_year = 1999) AND (d_qoy = 2)) │
│ Rows Removed by Filter: 33468 │
│ Storage: Memory Maximum Storage: 2635kB │
│ -> Materialize (cost=0.00..1797.36 rows=2 width=110) (actual time=0.000..0.053 rows=925.00 loops=825) │
│ Storage: Memory Maximum Storage: 74kB │
│ -> CTE Scan on ws ws2 (cost=0.00..1797.35 rows=2 width=110) (actual time=0.001..2.784 rows=925.00 loops=1) │
│ Filter: ((d_year = 1999) AND (d_qoy = 2)) │
│ Rows Removed by Filter: 22382 │
│ Storage: Memory Maximum Storage: 1700kB │
│ -> Merge Join (cost=0.00..8360.31 rows=1 width=220) (actual time=0.002..5.383 rows=1229.00 loops=696300) │
│ Merge Cond: ((ss3.ca_county)::text = (ws3.ca_county)::text) │
│ Buffers: temp read=498 │
│ -> CTE Scan on ss ss3 (cost=0.00..6562.93 rows=7 width=110) (actual time=0.001..4.051 rows=1796.00 loops=696300) │
│ Filter: ((d_year = 1999) AND (d_qoy = 3)) │
│ Rows Removed by Filter: 33292 │
│ Storage: Memory Maximum Storage: 2635kB │
│ Buffers: temp read=498 │
│ -> Materialize (cost=0.00..1797.36 rows=2 width=110) (actual time=0.000..0.047 rows=1261.00 loops=696300) │
│ Storage: Memory Maximum Storage: 95kB │
│ -> CTE Scan on ws ws3 (cost=0.00..1797.35 rows=2 width=110) (actual time=0.001..74.725 rows=1261.00 loops=1) │
│ Filter: ((d_year = 1999) AND (d_qoy = 3)) │
│ Rows Removed by Filter: 22051 │
│ Storage: Memory Maximum Storage: 1700kB │
│ Planning: │
│ Buffers: shared hit=12 │
│ Planning Time: 4.951 ms │
│ Execution Time: 3825542.556 ms │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-02 01:13 Richard Guo <[email protected]>
parent: Matheus Alcantara <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-10-02 01:13 UTC (permalink / raw)
To: Matheus Alcantara <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Thu, Oct 2, 2025 at 8:55 AM Matheus Alcantara
<[email protected]> wrote:
> The query 31 seems bad, I don't know if I'm doing something completely
> wrong but I've just setup a TPC-DS database and then executed the query
> on master and with the v23 patch and I got these results:
>
> Master:
> Planning Time: 3.191 ms
> Execution Time: 16950.619 ms
>
> Patch:
> Planning Time: 3.257 ms
> Execution Time: 3848355.646 ms
Thanks for reporting this. It does seem odd. I checked the TPC-DS
benchmarking on v13 and found that the execution time for query 31,
with and without eager aggregation, is as follows:
EAGER-AGG-OFF EAGER-AGG-ON
q31 10463.536 ms 10244.175 ms
There appears to be a regression between v13 and v23. Looking into
it...
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-02 01:39 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-10-02 01:39 UTC (permalink / raw)
To: Matheus Alcantara <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Thu, Oct 2, 2025 at 10:13 AM Richard Guo <[email protected]> wrote:
> On Thu, Oct 2, 2025 at 8:55 AM Matheus Alcantara
> <[email protected]> wrote:
> > The query 31 seems bad, I don't know if I'm doing something completely
> > wrong but I've just setup a TPC-DS database and then executed the query
> > on master and with the v23 patch and I got these results:
> >
> > Master:
> > Planning Time: 3.191 ms
> > Execution Time: 16950.619 ms
> >
> > Patch:
> > Planning Time: 3.257 ms
> > Execution Time: 3848355.646 ms
> Thanks for reporting this. It does seem odd. I checked the TPC-DS
> benchmarking on v13 and found that the execution time for query 31,
> with and without eager aggregation, is as follows:
>
> EAGER-AGG-OFF EAGER-AGG-ON
> q31 10463.536 ms 10244.175 ms
>
> There appears to be a regression between v13 and v23. Looking into
> it...
I noticed something interesting while comparing the two EXPLAIN
(ANALYZE) outputs: the patched version uses parallel plans, whereas
the master does not. To rule that out as a factor, I ran "SET
max_parallel_workers_per_gather TO 0;" and re-ran query 31 on both
master and the patched version. This time, I got a positive result.
-- on master
Planning Time: 5.281 ms
Execution Time: 7222.665 ms
-- on patched
Planning Time: 4.855 ms
Execution Time: 5977.287 ms
It seems eager aggregation doesn't cope well with parallel plans for
this query. Looking into it.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-02 08:49 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-10-02 08:49 UTC (permalink / raw)
To: Matheus Alcantara <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Thu, Oct 2, 2025 at 10:39 AM Richard Guo <[email protected]> wrote:
> It seems eager aggregation doesn't cope well with parallel plans for
> this query. Looking into it.
It turns out that this is not related to parallel plans but rather to
poor size estimates.
Looking at query 31, it involves joining 6 base relations, all of
which are CTE references (i.e., RTE_CTE relations) to two different
CTEs. Each CTE involves aggregations and GROUP BY clauses.
Unfortunately, our size estimates for CTE relations are quite poor,
especially when the CTE uses GROUP BY. In these cases, we don't have
any ANALYZE statistics available (cf. examine_simple_variable). As a
result, when computing the selectivity of the CTE relation's qual
clauses, we have to fall back on default values. For example, for
quals like "CTE.var = const", which are used a lot in query 31, the
selectivity is computed as "1.0 / DEFAULT_NUM_DISTINCT(200)", with the
assumption that there are DEFAULT_NUM_DISTINCT distinct values in the
relation, and that these values are equally common (cf. var_eq_const).
The consequence is that the size estimates are significantly different
from the actual values. For example, from the EXPLAIN(ANALYZE) output
provided by Matheus:
-> CTE Scan on ws ws3 (cost=0.00..1797.35 rows=2 width=110)
(actual time=0.001..74.725 rows=1261.00 loops=1)
Filter: ((d_year = 1999) AND (d_qoy = 3))
Interestingly, with eager aggregation applied, the row count estimates
for the two CTE plans actually become closer to the actual values.
-- without eager aggregation
CTE ws
-> HashAggregate (cost=96009.03..114825.35 rows=718952 width=54)
(actual time=977.215..1014.889 rows=23320.00 loops=1)
-- with eager aggregation
CTE ws
-> Finalize GroupAggregate (cost=52144.19..62314.79 rows=71894 width=54)
(actual time=275.121..340.107 rows=23312.00 loops=1)
However, due to the highly underestimated selectivity for the qual
clauses, the row count estimates for CTE Scan nodes become worse.
This is because:
-- without eager aggregation
718952 * (1.0/200) * (1.0/200) ~= 18
-- with eager aggregation
71894 * (1.0/200) * (1.0/200) ~= 2
... while the actual row count is 1261.00 as shown above.
That is to say, on master, the CTE plan rows are overestimated while
the selectivity estimates are severely underestimated. With eager
aggregation, the CTE plan rows become closer to the actual values, but
the selectivity estimates remain equally underestimated. As a result,
the row count estimates for the CTE Scan nodes worsen with eager
aggregation. This causes the join order in the final plan to change
when eager aggregation is applied, leading to longer execution times
in this case.
Another point to note is that, due to severely underestimated
selectivity estimates (0.000025, sometimes 0.000000125), the size
estimates for the CTE relations are very small, causing the planner to
tend to choose nestloops. I tried manually disabling nestloop, and
here are what I got for query 31.
-- on master, set enable_nestloop to on;
Planning Time: 4.613 ms
Execution Time: 7142.090 ms
-- on master, set enable_nestloop to off;
Planning Time: 4.315 ms
Execution Time: 2262.330 ms
-- on patched, set enable_nestloop to off;
Planning Time: 4.321 ms
Execution Time: 1214.376 ms
That is, on master, simply disabling nestloop makes query 31 run more
than 3 times faster. Enabling eager aggregation on top of that
improves performance further, making it run 1.86 times faster relative
to the nested-loop-disabled baseline.
I manually disabled nested loops for other TPC-DS queries on master
and discovered some additional interesting findings.
For query 4, on master:
-- set enable_nestloop to on
Planning Time: 3.054 ms
Execution Time: 3231356.258 ms
-- set enable_nestloop to off
Planning Time: 4.291 ms
Execution Time: 12751.170 ms
That is, on master, simply disabling nestloop makes query 4 run more
than 253 times faster.
For query 11, on master:
-- set enable_nestloop to on
Planning Time: 1.435 ms
Execution Time: 1824860.937 ms
-- set enable_nestloop to off
Planning Time: 2.479 ms
Execution Time: 7984.360 ms
Disabling nestloop makes query 11 run more than 228 times faster.
I believe you can find more such queries in TPC-DS if you keep
looking. Given this, I don't think it makes much sense to debug a
performance regression on TPC-DS with nestloop enabled.
Matheus, I wonder if you could help run TPC-DS again with this patch,
this time with nested loops disabled for all queries.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-02 18:40 Matheus Alcantara <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Matheus Alcantara @ 2025-10-02 18:40 UTC (permalink / raw)
To: Richard Guo <[email protected]>; Matheus Alcantara <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Thu Oct 2, 2025 at 5:49 AM -03, Richard Guo wrote:
> On Thu, Oct 2, 2025 at 10:39 AM Richard Guo <[email protected]> wrote:
>> It seems eager aggregation doesn't cope well with parallel plans for
>> this query. Looking into it.
>
> It turns out that this is not related to parallel plans but rather to
> poor size estimates.
>
> [ ... ]
> Matheus, I wonder if you could help run TPC-DS again with this patch,
> this time with nested loops disabled for all queries.
>
Thanks for all the details. I've disabled the nested loops and executed
the benchmark again and the results look much better! I see a 55%
improvement on query_31 on my machine now (MacOS M3 Max).
The only query that I see a considerable regression is query 23 which I
get a 23% worst execution time. I'm attaching the EXPLAIN(ANALYZE)
output from master and from the patched version if it's interesting.
I'm also attaching a csv with the planning time and execution time from
master and the patched version for all queries. It contains the % of
difference between the executions. Negative numbers means that the
patched version using eager aggregation is faster. (I loaded this csv on
a postgres table and played with some queries to analyze the results).
I'm just wondering if there is anything that can be done on the planner
to prevent this type of situation?
--
Matheus Alcantara
Attachments:
[application/octet-stream] query-23.master.explain (39.5K, 2-query-23.master.explain)
download
[application/octet-stream] query-23.patch.explain (38.4K, 3-query-23.patch.explain)
download
[text/csv] tpcds-eager-aggregate-times.csv (7.8K, 4-tpcds-eager-aggregate-times.csv)
download | inline:
Query,Patched Planning (ms),Patched Execution (ms),Master Planning (ms),Master Execution (ms),Planning Diff (%),Execution Diff (%)
query_1.sql,5.197,343109.883,5.718,342439.125,-9.111577474641482,0.19587656638241105
query_10.sql,9.652,1907.724,7.42,1711.916,30.080862533692716,11.4379443851217
query_11.sql,2.097,3679.389,1.902,12420.909,10.252365930599373,-70.37745788170577
query_12.sql,3.706,134.387,6.555,120.692,-43.46300533943555,11.34706525701787
query_13.sql,4.024,1470.213,3.821,1465.751,5.312745354619206,0.3044173259987535
query_14.sql,6.325,3035.944,5.998,3058.458,5.45181727242414,-0.7361225820331724
query_15.sql,1.706,223.125,1.782,221.967,-4.264870931537602,0.5216991714984601
query_16.sql,4.3,335.252,3.871,332.84,11.08240764660294,0.724672515322688
query_17.sql,17.019,586.035,14.329,584.251,18.77311745411402,0.30534821506509907
query_18.sql,4.558,831.184,4.739,819.676,-3.819371175353451,1.4039693732645488
query_19.sql,4.043,348.386,3.426,345.351,18.009340338587272,0.8788160451251118
query_2.sql,1.084,1009.305,1.137,1009.213,-4.66138962181178,0.009116014161528293
query_20.sql,1.411,197.526,1.331,196.463,6.010518407212627,0.5410688017591183
query_21.sql,3.656,759.008,3.374,759.377,8.358032009484292,-0.048592464612427624
query_22.sql,1.062,9664.424,1.155,9720.983,-8.051948051948049,-0.5818238752191963
query_23.sql,6.317,6733.136,2.386,5465.139,164.75272422464374,23.20155077482934
query_24.sql,4.863,71.777,6.99,69.682,-30.42918454935622,3.0065153124192743
query_25.sql,32.706,565.499,29.09,567.284,12.430388449639063,-0.314657208734949
query_26.sql,4.732,500.593,3.597,494.797,31.554072838476515,1.1713894789176151
query_27.sql,1.946,800.924,1.834,795.803,6.106870229007626,0.6435009669478478
query_28.sql,1.403,2115.748,1.177,2109.951,19.201359388275275,0.2747457168436625
query_29.sql,20.743,680.826,18.767,697.571,10.529120264293702,-2.400472496706429
query_3.sql,1.048,306.902,1.037,338.807,1.0607521697203588,-9.416865649174907
query_30.sql,2.163,23196.843,2.62,23227.62,-17.442748091603065,-0.1325017371560161
query_31.sql,3.805,2156.624,3.99,4813.289,-4.636591478696743,-55.19437956042116
query_32.sql,1.376,369.863,1.426,379.844,-3.5063113604488114,-2.627657669990837
query_33.sql,5.592,683.848,4.386,671.533,27.49658002735977,1.8338637118354484
query_34.sql,2.706,293.647,2.868,293.764,-5.648535564853554,-0.03982788905380463
query_35.sql,2.297,1714.709,2.327,1709.587,-1.2892135797163646,0.29960452436758533
query_36.sql,1.341,959.408,1.406,958.635,-4.623044096728304,0.08063548691629499
query_37.sql,3.266,701.037,3.338,692.29,-2.156980227681248,1.2634878446893747
query_38.sql,1.938,2983.255,1.867,2970.44,3.802892340653452,0.43141756776773993
query_39.sql,2.434,4296.654,2.185,4297.245,11.395881006864993,-0.013752997560051609
query_4.sql,4.104,6885.96,3.93,20300.931,4.427480916030532,-66.08057039354502
query_40.sql,4.232,227.916,3.992,226.594,6.0120240480961975,0.5834223324536407
query_41.sql,0.85,1895.989,0.825,1917.606,3.030303030303033,-1.127291007641818
query_42.sql,1.134,216.127,1.088,215.79,4.227941176470571,0.15617035080403055
query_43.sql,1.13,724.987,1.068,724.42,5.805243445692868,0.07826951216145431
query_44.sql,1.007,1009.076,0.973,1015.087,3.4943473792394575,-0.5921659916834682
query_45.sql,2.491,146.108,2.888,148.276,-13.746537396121877,-1.4621381747551905
query_46.sql,2.585,663.085,2.231,679.81,15.867324069923805,-2.460246245274402
query_47.sql,2.107,3566.484,2.349,4028.359,-10.302256279267773,-11.465586855590578
query_48.sql,2.327,1417.187,2.468,1429.552,-5.713128038897894,-0.8649562939997992
query_49.sql,5.191,1332.436,5.117,1300.731,1.446159859292551,2.437475542598733
query_5.sql,3.996,1254.475,3.78,1239.619,5.71428571428572,1.1984327442544842
query_50.sql,3.58,1306.014,2.578,1280.202,38.86733902249807,2.0162443114445923
query_51.sql,1.138,1937.95,1.043,1927.959,9.108341323106423,0.518216414353209
query_52.sql,1.057,216.683,1.026,217.304,3.0214424951266974,-0.2857747671464903
query_53.sql,1.689,299.636,1.477,299.117,14.353419092755582,0.17351069982649112
query_54.sql,3.396,690.892,2.901,687.181,17.063081695966915,0.5400323932122705
query_55.sql,1.041,215.656,0.958,216.543,8.663883089770351,-0.409618412971096
query_56.sql,6.743,696.477,5.359,682.625,25.825713752565782,2.0292254165903643
query_57.sql,2.859,1935.809,2.396,1971.9,19.323873121869788,-1.8302652264313668
query_58.sql,2.893,761.302,2.47,743.917,17.125506072874476,2.3369542569937227
query_59.sql,1.818,1294.186,1.722,1292.091,5.5749128919860675,0.1621402826890697
query_6.sql,2.387,132211.841,1.918,144414.127,24.452554744525553,-8.44950992917751
query_60.sql,4.764,709.541,8.15,770.35,-41.54601226993865,-7.8936846887778245
query_61.sql,4.542,6.09,4.613,6.447,-1.5391285497507177,-5.5374592833876255
query_62.sql,2.194,277.489,2.129,279.699,3.0530765617660847,-0.7901351095284703
query_63.sql,1.609,274.35,1.544,308.721,4.2098445595854885,-11.133353416191312
query_64.sql,231.018,993.314,110.067,993.579,109.88852244541962,-0.026671256135645617
query_65.sql,1.547,1432.056,1.402,1459.18,10.342368045649074,-1.858852232075551
query_66.sql,4.873,459.169,4.288,456.478,13.642723880597012,0.58951362387672
query_67.sql,1.332,6262.641,1.321,6268.535,0.8327024981075034,-0.09402515898850741
query_68.sql,2.459,434.767,2.04,434.896,20.539215686274513,-0.029662264081531928
query_69.sql,3.622,545.235,2.971,559.032,21.91181420397172,-2.468016142188645
query_7.sql,2.59,740.428,1.911,756.807,35.53113553113552,-2.1642241681168404
query_70.sql,1.346,1085.83,1.276,1093.831,5.4858934169279046,-0.7314658297305504
query_71.sql,1.764,690.918,1.636,695.244,7.823960880195606,-0.6222275920396324
query_72.sql,16.468,2433.574,15.637,2422.972,5.314318603312652,0.4375618042635186
query_73.sql,1.561,242.764,1.373,246.741,13.692643845593585,-1.6118115757008376
query_74.sql,2.275,2600.782,1.636,2613.011,39.05867970660147,-0.4680041530632598
query_75.sql,3.936,2060.653,3.872,2021.916,1.6528925619834725,1.9158560494105519
query_76.sql,1.839,262.956,1.808,256.183,1.7146017699114997,2.6438132116494946
query_77.sql,6.134,506.031,4.12,503.471,48.88349514563107,0.5084701998724857
query_78.sql,3.479,3376.111,2.942,3346.175,18.252889191026508,0.8946334247312138
query_79.sql,1.943,494.783,1.66,500.474,17.04819277108435,-1.1371220083360922
query_8.sql,2.108,118.778,1.603,117.003,31.503431066749854,1.5170551182448362
query_80.sql,9.398,810.869,7.552,767.436,24.44385593220339,5.659494733111294
query_81.sql,1.601,102358.136,1.673,101992.064,-4.303646144650332,0.3589220431895565
query_82.sql,2.106,910.711,1.992,888.395,5.722891566265054,2.511945699829471
query_83.sql,2.732,151.69,2.419,147.383,12.939231087226133,2.9223180421079684
query_84.sql,3.255,164.327,3.084,159.529,5.544747081712057,3.0076036331952194
query_85.sql,11.396,609.845,9.978,598.002,14.211264782521557,1.9804281591031596
query_86.sql,0.963,417.937,0.924,409.646,4.220779220779212,2.023942623631134
query_87.sql,1.908,2794.814,1.868,2739.314,2.1413276231263283,2.0260546983660874
query_88.sql,4.025,1909.274,3.887,1872.028,3.550295857988175,1.989606993057789
query_89.sql,1.589,448.853,1.409,437.45,12.775017743080195,2.6066979083323853
query_9.sql,1.044,2384.919,1.005,2353.257,3.8805970149253883,1.345454406382295
query_90.sql,1.568,239.619,1.424,234.375,10.112359550561807,2.23744
query_91.sql,3.797,207.382,2.786,202.386,36.28858578607323,2.468550196159818
query_92.sql,1.132,76.153,1.149,76.136,-1.4795474325500544,0.022328464852382737
query_93.sql,1.293,3.116,1.183,2.986,9.298393913778519,4.3536503683857966
query_94.sql,2.145,257.063,2.005,254.546,6.982543640897762,0.9888193096729062
query_95.sql,2.029,9785.071,2.102,9640.791,-3.4728829686013296,1.496557699466783
query_96.sql,1.056,233.41,1.06,229.286,-0.37735849056603804,1.7986270422092911
query_97.sql,1.142,1025.226,1.2,1015.871,-4.833333333333338,0.9208846398804702
query_98.sql,1.297,356.808,1.209,355.641,7.278742762613718,0.32813989388174397
query_99.sql,1.59,583.963,1.472,571.363,8.016304347826095,2.2052530527877914
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-03 03:14 Richard Guo <[email protected]>
parent: Matheus Alcantara <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-10-03 03:14 UTC (permalink / raw)
To: Matheus Alcantara <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Fri, Oct 3, 2025 at 3:41 AM Matheus Alcantara
<[email protected]> wrote:
> Thanks for all the details. I've disabled the nested loops and executed
> the benchmark again and the results look much better! I see a 55%
> improvement on query_31 on my machine now (MacOS M3 Max).
Great! That is 2.23 times faster.
> The only query that I see a considerable regression is query 23 which I
> get a 23% worst execution time. I'm attaching the EXPLAIN(ANALYZE)
> output from master and from the patched version if it's interesting.
I tested query 23 in my local environment but didn't observe the
regression.
-- on master
Planning Time: 1.950 ms
Execution Time: 3260.924 ms
-- on patched
Planning Time: 2.197 ms
Execution Time: 3237.287 ms
I ran the benchmark at scale factor 1 and executed ANALYZE beforehand.
For the build configuration, I disabled cassert.
Comparing the plans, I noticed one key difference: in the plan you
provided (query-23.patch.explain), the frequent_ss_items CTE uses
parallel aggregation, whereas in my local environment it does not.
This leads to a different final join order between the two plans.
However, given the highly inaccurate size and cost estimates for the
CTE Scan nodes, I'm not sure it's worth investigating further. I'm
starting to feel that trying to tune performance here, with such
inaccurate underlying estimates for CTEs, is like building on sand.
> I'm also attaching a csv with the planning time and execution time from
> master and the patched version for all queries. It contains the % of
> difference between the executions. Negative numbers means that the
> patched version using eager aggregation is faster. (I loaded this csv on
> a postgres table and played with some queries to analyze the results).
I really appreciate this; it's very helpful.
> I'm just wondering if there is anything that can be done on the planner
> to prevent this type of situation?
I think the ideal solution is to improve our estimates for CTE
relations to make the plans for TPC-DS queries more reasonable. Of
course, for queries from other benchmarks, the issues may stem from
other plan nodes. IMHO, we really need some improvements in our cost
estimation.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-03 20:03 Matheus Alcantara <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Matheus Alcantara @ 2025-10-03 20:03 UTC (permalink / raw)
To: Richard Guo <[email protected]>; Matheus Alcantara <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Fri Oct 3, 2025 at 12:14 AM -03, Richard Guo wrote:
>> The only query that I see a considerable regression is query 23 which I
>> get a 23% worst execution time. I'm attaching the EXPLAIN(ANALYZE)
>> output from master and from the patched version if it's interesting.
>
> I tested query 23 in my local environment but didn't observe the
> regression.
>
> -- on master
> Planning Time: 1.950 ms
> Execution Time: 3260.924 ms
>
> -- on patched
> Planning Time: 2.197 ms
> Execution Time: 3237.287 ms
>
> I ran the benchmark at scale factor 1 and executed ANALYZE beforehand.
> For the build configuration, I disabled cassert.
>
I've disabled the cassert and executed the ANALYZE again before
benchmarking and now I have similar results with a improvement on eager
aggregate version:
-- master
Planning Time: 2.734 ms
Execution Time: 5238.128 ms
-- patched
Planning Time: 2.578 ms
Execution Time: 4732.584 ms
> Comparing the plans, I noticed one key difference: in the plan you
> provided (query-23.patch.explain), the frequent_ss_items CTE uses
> parallel aggregation, whereas in my local environment it does not.
> This leads to a different final join order between the two plans.
>
> However, given the highly inaccurate size and cost estimates for the
> CTE Scan nodes, I'm not sure it's worth investigating further. I'm
> starting to feel that trying to tune performance here, with such
> inaccurate underlying estimates for CTEs, is like building on sand.
>
> [ ...]
>
>> I'm just wondering if there is anything that can be done on the planner
>> to prevent this type of situation?
>
> I think the ideal solution is to improve our estimates for CTE
> relations to make the plans for TPC-DS queries more reasonable. Of
> course, for queries from other benchmarks, the issues may stem from
> other plan nodes. IMHO, we really need some improvements in our cost
> estimation.
>
Fair points, agree.
The performance results look good to me. I don't have to much comments
about the code although I'm still learning about the planner internals
this patch seems in good shape to me.
I'm just attaching a new csv with the last results after running with
cassert disabled and after executing ANALYZE. It looks good to me.
Thanks for working on this!
--
Matheus Alcantara
Attachments:
[text/csv] tpcds-eager-aggregate-times.csv (7.8K, 2-tpcds-eager-aggregate-times.csv)
download | inline:
Query,Patched Planning (ms),Patched Execution (ms),Master Planning (ms),Master Execution (ms),Planning Diff (%),Execution Diff (%)
query_1.sql,1.772,348111.128,1.448,347986.657,22.37569060773481,0.035768900185164154
query_10.sql,3.916,1708.264,3.628,1735.879,7.93825799338478,-1.5908366885019065
query_11.sql,1.874,3938.543,1.732,12631.077,8.198614318706705,-68.81862884693048
query_12.sql,2.423,118.035,1.938,120.104,25.025799793601657,-1.7226736828082352
query_13.sql,3.725,1449.302,3.875,1472.918,-3.8709677419354813,-1.603347912103728
query_14.sql,5.585,3142.689,4.926,3153.894,13.377994315874945,-0.3552750980216814
query_15.sql,3.787,229.036,16.127,226.61,-76.51764122279407,1.070561758086575
query_16.sql,3.933,340.588,3.744,330.124,5.048076923076913,3.169718045340538
query_17.sql,20.183,582.729,16.598,581.64,21.598987829859027,0.18722921394678074
query_18.sql,6.141,832.748,5.543,895.849,10.788381742738586,-7.043709375129067
query_19.sql,4.363,345.011,3.951,341.711,10.427739812705653,0.9657283493946672
query_2.sql,1.777,1029.284,2.096,1013.598,-15.219465648854968,1.5475563290377594
query_20.sql,1.605,199.484,1.696,198.66,-5.365566037735848,0.41477901943018836
query_21.sql,2.993,763.408,2.926,760.504,2.289815447710175,0.38185203496628506
query_22.sql,1.081,9782.704,1.017,9812.876,6.293018682399219,-0.3074735684013584
query_23.sql,8.857,4937.117,7.745,5440.361,14.357650096836657,-9.25019497787003
query_24.sql,6.435,73.181,5.783,68.873,11.274425038907127,6.254991070521093
query_25.sql,33.859,563.156,29.377,562.256,15.256833577288365,0.1600694345636111
query_26.sql,4.605,496.635,3.725,461.931,23.624161073825512,7.512810354793251
query_27.sql,2.108,802.41,2.013,791.632,4.719324391455549,1.3614911979303541
query_28.sql,1.203,2129.017,1.157,2114.691,3.975799481417462,0.6774512210058123
query_29.sql,20.799,699.452,17.871,695.774,16.384085949303344,0.5286199254355577
query_3.sql,1.076,316.886,1.465,314.489,-26.552901023890783,0.7621888205946944
query_30.sql,2.068,23992.385,2.01,23732.395,2.8855721393034965,1.0955067956689495
query_31.sql,3.443,2170.827,3.71,4956.94,-7.196765498652288,-56.20630873078956
query_32.sql,1.46,381.865,1.558,384.881,-6.290115532734281,-0.7836188328340352
query_33.sql,6.144,665.558,5.328,683.21,15.315315315315312,-2.583685835980159
query_34.sql,3.072,294.182,2.48,294.62,23.870967741935488,-0.14866607833819434
query_35.sql,3.058,1696.758,3.066,1741.545,-0.26092628832354886,-2.571682040946403
query_36.sql,1.472,951.782,1.543,957.131,-4.601425793907969,-0.5588576694308232
query_37.sql,2.862,692.185,3.06,695.129,-6.470588235294115,-0.42351851239123584
query_38.sql,1.757,2934.366,2.039,2945.52,-13.830308974987751,-0.3786767701458485
query_39.sql,2.106,4287.367,2.459,4355.186,-14.355429036193582,-1.5572010012890267
query_4.sql,4.78,7367.944,4.783,20359.41,-0.06272214091574563,-63.81062123116534
query_40.sql,4.527,229.529,17.79,235.618,-74.55311973018549,-2.584267755434644
query_41.sql,1.076,1911.485,1.364,1898.805,-21.114369501466275,0.6677884248250787
query_42.sql,2.185,214.184,1.876,216.248,16.471215351812376,-0.9544596944249164
query_43.sql,1.618,717.386,1.678,719.886,-3.5756853396900974,-0.3472772077801208
query_44.sql,1.095,999.478,1.022,1020.661,7.142857142857138,-2.0754197524937266
query_45.sql,4.503,147.938,3.965,150.025,13.568726355611608,-1.3911014830861639
query_46.sql,3.389,670.253,2.905,669.081,16.66092943201377,0.17516563764327867
query_47.sql,2.107,3993.831,2.096,3928.992,0.5248091603053493,1.650270603757909
query_48.sql,3.193,1433.704,3.028,1413.382,5.449141347424043,1.4378278483806846
query_49.sql,6.013,1328.264,5.649,1305.393,6.443618339529118,1.7520394241427575
query_5.sql,12.624,1248.898,4.026,1263.176,213.56184798807752,-1.1303254653349986
query_50.sql,3.642,1363.389,2.689,1318.568,35.44068426924507,3.3992179394615913
query_51.sql,1.567,1934.372,1.078,1905.328,45.36178107606678,1.524356961111163
query_52.sql,1.323,215.693,1.063,215.418,24.459078080903108,0.12765878431700492
query_53.sql,1.74,296.488,1.558,297.292,11.681643132220792,-0.27044118240651405
query_54.sql,3.749,689.41,3.331,684.939,12.548784148904238,0.6527588588180852
query_55.sql,1.077,213.579,0.969,214.217,11.145510835913312,-0.29782883711377023
query_56.sql,7.254,693.427,5.672,691.625,27.891396332863188,0.26054581601301585
query_57.sql,2.649,1828.543,2.581,1956.034,2.634637737311122,-6.517831489636694
query_58.sql,2.929,769.344,2.601,752.896,12.610534409842364,2.184631077864684
query_59.sql,1.899,1275.897,1.797,1287.483,5.676126878130222,-0.8998953772593512
query_6.sql,2.646,136423.314,1.94,148868.237,36.391752577319586,-8.359689918273151
query_60.sql,4.999,723.689,11.688,772.551,-57.22963723477071,-6.324760436527825
query_61.sql,5.127,6.56,6.288,6.482,-18.463740458015273,1.2033323048441746
query_62.sql,2.522,279.014,2.42,276.941,4.2148760330578465,0.7485348864920818
query_63.sql,1.722,271.157,1.439,299.55,19.66643502432244,-9.478551160073453
query_64.sql,246.953,961.432,108.404,1220.635,127.80801446441092,-21.235094848173286
query_65.sql,2.185,1451.433,1.613,1459.66,35.46187228766274,-0.5636244056835213
query_66.sql,6.342,469.304,4.331,461.772,46.432694527822655,1.6311079926890288
query_67.sql,1.381,6239.233,1.333,6275.725,3.600900225056267,-0.581478633942695
query_68.sql,2.712,444.651,2.462,454.96,10.154346060113728,-2.2659134868999407
query_69.sql,3.659,549.403,3.068,559.8,19.26336375488917,-1.8572704537334648
query_7.sql,2.512,748.722,1.877,763.937,33.830580713905164,-1.991656380041814
query_70.sql,1.378,1082.349,1.282,1099.89,7.488299531981268,-1.5947958432207008
query_71.sql,1.616,664.626,1.879,690.085,-13.996806812134107,-3.6892556714028064
query_72.sql,17.423,2425.441,16.905,2431.505,3.06418219461696,-0.24939286573543157
query_73.sql,1.532,240.721,1.491,248.724,2.749832327297111,-3.2176227464981206
query_74.sql,2.461,2606.065,1.679,2695.639,46.57534246575341,-3.32292269105767
query_75.sql,5.763,2152.559,5.39,2252.586,6.920222634508353,-4.440540782904608
query_76.sql,1.77,260.015,1.832,273.168,-3.384279475982536,-4.814985649856506
query_77.sql,7.189,502.823,4.539,504.798,58.382903723287086,-0.3912456071537571
query_78.sql,4.667,3404.293,3.075,3526.288,51.77235772357722,-3.4595869651032443
query_79.sql,2.518,441.678,1.757,497.753,43.31246442800227,-11.265627731023216
query_8.sql,2.616,113.6,1.731,118.141,51.12651646447141,-3.8437121744356415
query_80.sql,9.607,777.569,7.965,794.619,20.615191462649083,-2.145682396217567
query_81.sql,1.744,104737.831,1.539,103999.861,13.320337881741395,0.7095874868525076
query_82.sql,1.98,904.683,1.956,903.053,1.226993865030676,0.18049881900619294
query_83.sql,2.78,159.719,2.572,155.356,8.087091757387237,2.808388475501429
query_84.sql,3.311,164.835,3.243,162.348,2.096823928461303,1.5318944489614867
query_85.sql,11.635,607.475,9.547,603.498,21.87074473656645,0.6589914133932465
query_86.sql,1.038,444.518,0.948,435.156,9.493670886075959,2.151412367059162
query_87.sql,2.509,3033.169,1.84,3022.709,36.3586956521739,0.34604720467633626
query_88.sql,4.439,1887.336,4.055,1882.765,9.469790382244152,0.2427812286716564
query_89.sql,1.578,447.047,1.429,447.93,10.426871938418476,-0.19712901569441235
query_9.sql,0.999,2377.692,1.06,2359.84,-5.754716981132081,0.7564919655569811
query_90.sql,1.484,243.019,1.521,240.875,-2.432610124917812,0.8900882200311387
query_91.sql,4.539,211.898,3.484,203.032,30.281285878300796,4.366799322274314
query_92.sql,1.185,76.483,1.149,77.097,3.1331592689295062,-0.7963993410897832
query_93.sql,1.427,3.337,1.236,3.07,15.453074433656964,8.697068403908807
query_94.sql,2.112,265.029,2.338,259.955,-9.666381522668946,1.9518762862803227
query_95.sql,2.02,9955.019,1.959,9880.937,3.1138335885655914,0.7497467092442786
query_96.sql,1.295,231.706,1.092,226.021,18.589743589743573,2.5152530074639095
query_97.sql,1.204,1032.419,1.11,1033.895,8.468468468468455,-0.14276111210518336
query_98.sql,1.472,374.712,1.341,366.122,9.768829231916481,2.346212464697553
query_99.sql,1.509,585.547,1.578,587.693,-4.372623574144498,-0.365156637904477
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-06 00:56 Richard Guo <[email protected]>
parent: Matheus Alcantara <[email protected]>
0 siblings, 0 replies; 55+ messages in thread
From: Richard Guo @ 2025-10-06 00:56 UTC (permalink / raw)
To: Matheus Alcantara <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Sat, Oct 4, 2025 at 5:03 AM Matheus Alcantara
<[email protected]> wrote:
> I've disabled the cassert and executed the ANALYZE again before
> benchmarking and now I have similar results with a improvement on eager
> aggregate version:
>
> -- master
> Planning Time: 2.734 ms
> Execution Time: 5238.128 ms
>
> -- patched
> Planning Time: 2.578 ms
> Execution Time: 4732.584 ms
Great!
> The performance results look good to me. I don't have to much comments
> about the code although I'm still learning about the planner internals
> this patch seems in good shape to me.
Thanks for running the benchmark and reviewing the patch.
> I'm just attaching a new csv with the last results after running with
> cassert disabled and after executing ANALYZE. It looks good to me.
Yeah, the results look good this time. There are no performance
regressions; on the contrary, several queries actually show very
really nice improvements.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-06 00:59 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
1 sibling, 2 replies; 55+ messages in thread
From: Richard Guo @ 2025-10-06 00:59 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Mon, Sep 29, 2025 at 11:09 AM Richard Guo <[email protected]> wrote:
> FWIW, I plan to do another self-review of this patch soon, with the
> goal of assessing whether it's ready to be pushed. If anyone has any
> concerns about any part of the patch or would like to review it, I
> would greatly appreciate hearing from you.
Barring any objections, I'll plan to push v23 in a couple of days.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-06 13:59 David Rowley <[email protected]>
parent: Richard Guo <[email protected]>
1 sibling, 1 reply; 55+ messages in thread
From: David Rowley @ 2025-10-06 13:59 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Mon, 6 Oct 2025 at 13:59, Richard Guo <[email protected]> wrote:
> Barring any objections, I'll plan to push v23 in a couple of days.
Not a complete review, but a customary look:
1. setup_base_grouped_rels() by name and the header comment claim to
operate on base relations, but the code seems to be coded to handle
OTHER_MEMBER rels too.
Note that set_base_rel_pathlists() explicitly skips anything that's
not RELOPT_BASEREL, so if you're not doing that, then you shouldn't
use "base" in the function name. It's confusing.
2. All the calls to generate_grouped_paths() pass the grouped_rel
RelOptInfo and also grouped_rel->agg_info. Is there a reason to keep
it that way rather than access the agg_info from the given grouped_rel
from within the function?
3. " * The information needed are provided by the RelAggInfo
structure." This should use "is" rather than "are"
4. standard_join_search(). I think it's worth getting rid of the
duplicate "if (!bms_equal(rel->relids, root->all_query_rels))" check.
How about setting that in a local variable rather than recalling
bms_equal(). I don't believe the compiler will optimise the extra one
away as it can't know set_cheapest() doesn't change the relids. Also,
wouldn't it be better to check rel->grouped_rel != NULL first? Won't
that be NULL in most cases, where as !bms_equal(rel->relids,
root->all_query_rels) will be true in most cases? Likewise in
generate_partitionwise_join_paths().
5. Wouldn't it be better to do 0002 first and get that into core so
you don't have to do the hacky stuff in is_partial_agg_memory_risky()?
6. Shouldn't this be using lappend()?
agg_clause_list = list_append_unique(agg_clause_list, ac_info);
I don't understand why ac_info could already be in the list. You've
just done: ac_info = makeNode(AggClauseInfo);
7. The following comment talks about "base" relations. I don't think
it should be as the RelOptInfo can be an OTHER_MEMBER rel.
* build_simple_grouped_rel
* Construct a new RelOptInfo representing a grouped version of the input
* base relation.
*/
8. Normally we check the List is NIL instead of:
if (list_length(group_clauses) == 0)
9. In get_expression_sortgroupref(), a comment claims "We ignore child
members here.". I think that's outdated since ec_members no longer has
child members.
10. I don't think this comment quite makes sense:
* "apply_at" tracks the lowest join level at which partial aggregation is
* applied.
maybe "minimum set of rels to join before partial aggregation can be applied"?
or at least swap "is" for "can be".
My confusion comes from the fact you're stating "lowest join level",
which seems to indicate that it could be applied after further
relations have been joined, but then you're saying "is applied" to
indicate that it can only be applied at that level.
11. The way you've written the header comments for typedef struct
RelAggInfo seems weird. I've only ever seen extra details in the
header comment when the inline comments have been kept to a single
line. You're spanning multiple lines, so why have the out of line
comments in the header at all?
12. This just doesn't feel like the right name for this field:
/* lowest level partial aggregation is applied at */
Relids apply_at;
I can't help think that it should be something like "agg_relids" or
"required_relids". I understand you're currently only applying the
partial grouping when you get exactly the minimum set of relids in the
join search, but if this can be made fast enough, I expect that could
be changed in the future. If you do change it, then "apply_at" is a
pretty confusing name. Perhaps I've misunderstood here and if you did
that, you'd need to create another RelAggInfo to represent that?
13. Parameter names mismatch between definition and declaration in:
extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
RelOptInfo *rel_plain);
extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
RelOptInfo *rel_plain);
extern void generate_grouped_paths(PlannerInfo *root,
RelOptInfo *rel_grouped,
RelOptInfo *rel_plain,
RelAggInfo *agg_info);
14. Do all the regression tests need VERBOSE in EXPLAIN? It's making
the output kinda huge. It might also be nice to wrap the long queries
onto multiple lines to make them easier to read.
David
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-07 10:56 Richard Guo <[email protected]>
parent: David Rowley <[email protected]>
0 siblings, 2 replies; 55+ messages in thread
From: Richard Guo @ 2025-10-07 10:56 UTC (permalink / raw)
To: David Rowley <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Mon, Oct 6, 2025 at 10:59 PM David Rowley <[email protected]> wrote:
> Not a complete review, but a customary look:
Thanks for all the comments! They've been very helpful.
> 1. setup_base_grouped_rels() by name and the header comment claim to
> operate on base relations, but the code seems to be coded to handle
> OTHER_MEMBER rels too.
Indeed. I renamed it to setup_simple_grouped_rels() and updated the
related comments in v24.
> 2. All the calls to generate_grouped_paths() pass the grouped_rel
> RelOptInfo and also grouped_rel->agg_info. Is there a reason to keep
> it that way rather than access the agg_info from the given grouped_rel
> from within the function?
Thanks. Fixed by removing the agg_info parameter.
> 3. " * The information needed are provided by the RelAggInfo
> structure." This should use "is" rather than "are"
Yes.
> 4. standard_join_search(). I think it's worth getting rid of the
> duplicate "if (!bms_equal(rel->relids, root->all_query_rels))" check.
> How about setting that in a local variable rather than recalling
> bms_equal(). I don't believe the compiler will optimise the extra one
> away as it can't know set_cheapest() doesn't change the relids. Also,
> wouldn't it be better to check rel->grouped_rel != NULL first? Won't
> that be NULL in most cases, where as !bms_equal(rel->relids,
> root->all_query_rels) will be true in most cases? Likewise in
> generate_partitionwise_join_paths().
Good point. Done that way in v24.
> 5. Wouldn't it be better to do 0002 first and get that into core so
> you don't have to do the hacky stuff in is_partial_agg_memory_risky()?
Agreed. Done in v24.
> 6. Shouldn't this be using lappend()?
>
> agg_clause_list = list_append_unique(agg_clause_list, ac_info);
>
> I don't understand why ac_info could already be in the list. You've
> just done: ac_info = makeNode(AggClauseInfo);
A query can specify the same Aggref expressions multiple times in the
target list. Using lappend here can lead to duplicate partial Aggref
nodes in the targetlist of a grouped path, which is what I want to
avoid.
> 7. The following comment talks about "base" relations. I don't think
> it should be as the RelOptInfo can be an OTHER_MEMBER rel.
>
> * build_simple_grouped_rel
> * Construct a new RelOptInfo representing a grouped version of the input
> * base relation.
> */
Fixed in v24.
> 8. Normally we check the List is NIL instead of:
>
> if (list_length(group_clauses) == 0)
Right. Updated in v24.
> 9. In get_expression_sortgroupref(), a comment claims "We ignore child
> members here.". I think that's outdated since ec_members no longer has
> child members.
I think that comment is used to explain why we only scan ec_members
here. Similar comments can be found in many other places, such as in
equivclass.c:
/*
* Found our match. Scan the other EC members and attempt to generate
* joinclauses. Ignore children here.
*/
foreach(lc2, cur_ec->ec_members)
{
> 10. I don't think this comment quite makes sense:
>
> * "apply_at" tracks the lowest join level at which partial aggregation is
> * applied.
>
> maybe "minimum set of rels to join before partial aggregation can be applied"?
>
> or at least swap "is" for "can be".
>
> My confusion comes from the fact you're stating "lowest join level",
> which seems to indicate that it could be applied after further
> relations have been joined, but then you're saying "is applied" to
> indicate that it can only be applied at that level.
>
> 11. The way you've written the header comments for typedef struct
> RelAggInfo seems weird. I've only ever seen extra details in the
> header comment when the inline comments have been kept to a single
> line. You're spanning multiple lines, so why have the out of line
> comments in the header at all?
>
> 12. This just doesn't feel like the right name for this field:
>
> /* lowest level partial aggregation is applied at */
> Relids apply_at;
>
> I can't help think that it should be something like "agg_relids" or
> "required_relids". I understand you're currently only applying the
> partial grouping when you get exactly the minimum set of relids in the
> join search, but if this can be made fast enough, I expect that could
> be changed in the future. If you do change it, then "apply_at" is a
> pretty confusing name. Perhaps I've misunderstood here and if you did
> that, you'd need to create another RelAggInfo to represent that?
Hmm, RelAggInfo is a per-relation structure; each grouped relation has
a valid RelAggInfo. The apply_at field represents the set of relids
where partial aggregation is applied within the paths of this grouped
relation. If we ever change this approach and allow the planner to
explore all join levels for placing partial aggregation, the apply_at
field will become obsolete (cf. prior to v17 patches).
I've updated the comment for apply_at to clarify that it refers to the
relids at which partial aggregation is applied.
I've also updated the comments within RelAggInfo to use one-line
style.
I retained the name of this field though.
> 13. Parameter names mismatch between definition and declaration in:
>
> extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root,
> RelOptInfo *rel_plain);
> extern RelOptInfo *build_grouped_rel(PlannerInfo *root,
> RelOptInfo *rel_plain);
>
> extern void generate_grouped_paths(PlannerInfo *root,
> RelOptInfo *rel_grouped,
> RelOptInfo *rel_plain,
> RelAggInfo *agg_info);
Nice catch! Fixed in v24.
> 14. Do all the regression tests need VERBOSE in EXPLAIN? It's making
> the output kinda huge. It might also be nice to wrap the long queries
> onto multiple lines to make them easier to read.
One of the challenges in this patch is generating the correct target
list for each grouped relation. So I'm kind of inclined to retain
VERBOSE in EXPLAIN. As I recall, the output target list in the test
cases saved me several times during development when I introduced
problematic code changes.
I wrapped the long queries in v24.
- Richard
Attachments:
[application/octet-stream] v24-0001-Allow-negative-aggtransspace-to-indicate-unbound.patch (6.3K, 2-v24-0001-Allow-negative-aggtransspace-to-indicate-unbound.patch)
download | inline diff:
From dc5d4fb9bae1412c3230329d22616e13f3cc9662 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Tue, 7 Oct 2025 10:16:37 +0900
Subject: [PATCH v24 1/2] Allow negative aggtransspace to indicate unbounded
state size
This patch reuses the existing aggtransspace in pg_aggregate to
signal that an aggregate's transition state can grow unboundedly. If
aggtransspace is set to a negative value, it now indicates that the
transition state may consume unpredictable or large amounts of memory,
such as in aggregates like array_agg or string_agg that accumulate
input rows.
This information can be used by the planner to avoid applying
memory-sensitive optimizations (e.g., eager aggregation) when there is
a risk of excessive memory usage during partial aggregation.
Bump catalog version.
Per idea from Robert Haas, though applied differently than originally
suggested.
Discussion: https://postgr.es/m/CA+TgmoYbkvYwLa+1vOP7RDY7kO2=A7rppoPusoRXe44VDOGBPg@mail.gmail.com
---
doc/src/sgml/catalogs.sgml | 5 ++++-
doc/src/sgml/ref/create_aggregate.sgml | 11 ++++++++---
src/include/catalog/pg_aggregate.dat | 10 ++++++----
src/test/regress/expected/opr_sanity.out | 2 +-
src/test/regress/sql/opr_sanity.sql | 2 +-
5 files changed, 20 insertions(+), 10 deletions(-)
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e9095bedf21..3acc2222a87 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -596,7 +596,10 @@
</para>
<para>
Approximate average size (in bytes) of the transition state
- data, or zero to use a default estimate
+ data. A positive value provides an estimate; zero means to
+ use a default estimate. A negative value indicates the state
+ data can grow unboundedly in size, such as when the aggregate
+ accumulates input rows (e.g., array_agg, string_agg).
</para></entry>
</row>
diff --git a/doc/src/sgml/ref/create_aggregate.sgml b/doc/src/sgml/ref/create_aggregate.sgml
index 222e0aa5c9d..0472ac2e874 100644
--- a/doc/src/sgml/ref/create_aggregate.sgml
+++ b/doc/src/sgml/ref/create_aggregate.sgml
@@ -384,9 +384,13 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<para>
The approximate average size (in bytes) of the aggregate's state value.
If this parameter is omitted or is zero, a default estimate is used
- based on the <replaceable>state_data_type</replaceable>.
+ based on the <replaceable>state_data_type</replaceable>. If set to a
+ negative value, it indicates the state data can grow unboundedly in
+ size, such as when the aggregate accumulates input rows (e.g.,
+ array_agg, string_agg).
The planner uses this value to estimate the memory required for a
- grouped aggregate query.
+ grouped aggregate query and to avoid optimizations that may cause
+ excessive memory usage.
</para>
</listitem>
</varlistentry>
@@ -568,7 +572,8 @@ SELECT col FROM tab ORDER BY col USING sortop LIMIT 1;
<para>
The approximate average size (in bytes) of the aggregate's state
value, when using moving-aggregate mode. This works the same as
- <replaceable>state_data_size</replaceable>.
+ <replaceable>state_data_size</replaceable>, except that negative
+ values are not used to indicate unbounded state size.
</para>
</listitem>
</varlistentry>
diff --git a/src/include/catalog/pg_aggregate.dat b/src/include/catalog/pg_aggregate.dat
index d6aa1f6ec47..870769e8f14 100644
--- a/src/include/catalog/pg_aggregate.dat
+++ b/src/include/catalog/pg_aggregate.dat
@@ -558,26 +558,28 @@
aggfinalfn => 'array_agg_finalfn', aggcombinefn => 'array_agg_combine',
aggserialfn => 'array_agg_serialize',
aggdeserialfn => 'array_agg_deserialize', aggfinalextra => 't',
- aggtranstype => 'internal' },
+ aggtranstype => 'internal', aggtransspace => '-1' },
{ aggfnoid => 'array_agg(anyarray)', aggtransfn => 'array_agg_array_transfn',
aggfinalfn => 'array_agg_array_finalfn',
aggcombinefn => 'array_agg_array_combine',
aggserialfn => 'array_agg_array_serialize',
aggdeserialfn => 'array_agg_array_deserialize', aggfinalextra => 't',
- aggtranstype => 'internal' },
+ aggtranstype => 'internal', aggtransspace => '-1' },
# text
{ aggfnoid => 'string_agg(text,text)', aggtransfn => 'string_agg_transfn',
aggfinalfn => 'string_agg_finalfn', aggcombinefn => 'string_agg_combine',
aggserialfn => 'string_agg_serialize',
- aggdeserialfn => 'string_agg_deserialize', aggtranstype => 'internal' },
+ aggdeserialfn => 'string_agg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '-1' },
# bytea
{ aggfnoid => 'string_agg(bytea,bytea)',
aggtransfn => 'bytea_string_agg_transfn',
aggfinalfn => 'bytea_string_agg_finalfn',
aggcombinefn => 'string_agg_combine', aggserialfn => 'string_agg_serialize',
- aggdeserialfn => 'string_agg_deserialize', aggtranstype => 'internal' },
+ aggdeserialfn => 'string_agg_deserialize',
+ aggtranstype => 'internal', aggtransspace => '-1' },
# range
{ aggfnoid => 'range_intersect_agg(anyrange)',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 20bf9ea9cdf..a357e1d0c0e 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -1470,7 +1470,7 @@ WHERE aggfnoid = 0 OR aggtransfn = 0 OR
(aggkind = 'n' AND aggnumdirectargs > 0) OR
aggfinalmodify NOT IN ('r', 's', 'w') OR
aggmfinalmodify NOT IN ('r', 's', 'w') OR
- aggtranstype = 0 OR aggtransspace < 0 OR aggmtransspace < 0;
+ aggtranstype = 0 OR aggmtransspace < 0;
ctid | aggfnoid
------+----------
(0 rows)
diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql
index 2fb3a852878..cd674d7dbca 100644
--- a/src/test/regress/sql/opr_sanity.sql
+++ b/src/test/regress/sql/opr_sanity.sql
@@ -847,7 +847,7 @@ WHERE aggfnoid = 0 OR aggtransfn = 0 OR
(aggkind = 'n' AND aggnumdirectargs > 0) OR
aggfinalmodify NOT IN ('r', 's', 'w') OR
aggmfinalmodify NOT IN ('r', 's', 'w') OR
- aggtranstype = 0 OR aggtransspace < 0 OR aggmtransspace < 0;
+ aggtranstype = 0 OR aggmtransspace < 0;
-- Make sure the matching pg_proc entry is sensible, too.
--
2.39.5 (Apple Git-154)
[application/octet-stream] v24-0002-Implement-Eager-Aggregation.patch (188.5K, 3-v24-0002-Implement-Eager-Aggregation.patch)
download | inline diff:
From d03a39b1a88bee1280fbdd61529eac428902b39e Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Tue, 11 Jun 2024 15:59:19 +0900
Subject: [PATCH v24 2/2] Implement Eager Aggregation
Eager aggregation is a query optimization technique that partially
pushes aggregation past a join, and finalizes it once all the
relations are joined. Eager aggregation may reduce the number of
input rows to the join and thus could result in a better overall plan.
In the current planner architecture, the separation between the
scan/join planning phase and the post-scan/join phase means that
aggregation steps are not visible when constructing the join tree,
limiting the planner's ability to exploit aggregation-aware
optimizations. To implement eager aggregation, we collect information
about aggregate functions in the targetlist and HAVING clause, along
with grouping expressions from the GROUP BY clause, and store it in
the PlannerInfo node. During the scan/join planning phase, this
information is used to evaluate each base or join relation to
determine whether eager aggregation can be applied. If applicable, we
create a separate RelOptInfo, referred to as a grouped relation, to
represent the partially-aggregated version of the relation and
generate grouped paths for it.
Grouped relation paths can be generated in two ways. The first method
involves adding sorted and hashed partial aggregation paths on top of
the non-grouped paths. To limit planning time, we only consider the
cheapest or suitably-sorted non-grouped paths in this step.
Alternatively, grouped paths can be generated by joining a grouped
relation with a non-grouped relation. Joining two grouped relations
is currently not supported.
To further limit planning time, we currently adopt a strategy where
partial aggregation is pushed only to the lowest feasible level in the
join tree where it provides a significant reduction in row count.
This strategy also helps ensure that all grouped paths for the same
grouped relation produce the same set of rows, which is important to
support a fundamental assumption of the planner.
For the partial aggregation that is pushed down to a non-aggregated
relation, we need to consider all expressions from this relation that
are involved in upper join clauses and include them in the grouping
keys, using compatible operators. This is essential to ensure that an
aggregated row from the partial aggregation matches the other side of
the join if and only if each row in the partial group does. This
ensures that all rows within the same partial group share the same
"destiny", which is crucial for maintaining correctness.
One restriction is that we cannot push partial aggregation down to a
relation that is in the nullable side of an outer join, because the
NULL-extended rows produced by the outer join would not be available
when we perform the partial aggregation, while with a
non-eager-aggregation plan these rows are available for the top-level
aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
If we have generated a grouped relation for the topmost join relation,
we finalize its paths at the end. The final paths will compete in the
usual way with paths built from regular planning.
The patch was originally proposed by Antonin Houska in 2017. This
commit reworks various important aspects and rewrites most of the
current code. However, the original patch and reviews were very
useful.
Author: Richard Guo <[email protected]>
Author: Antonin Houska <[email protected]> (in an older version)
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Jian He <[email protected]>
Reviewed-by: Tender Wang <[email protected]>
Reviewed-by: Matheus Alcantara <[email protected]>
Reviewed-by: Tom Lane <[email protected]>
Reviewed-by: David Rowley <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]> (in an older version)
Reviewed-by: Andy Fan <[email protected]> (in an older version)
Reviewed-by: Ashutosh Bapat <[email protected]> (in an older version)
Discussion: https://postgr.es/m/CAMbWs48jzLrPt1J_00ZcPZXWUQKawQOFE8ROc-ADiYqsqrpBNw@mail.gmail.com
---
.../postgres_fdw/expected/postgres_fdw.out | 49 +-
doc/src/sgml/config.sgml | 31 +
src/backend/optimizer/README | 110 ++
src/backend/optimizer/geqo/geqo_eval.c | 21 +-
src/backend/optimizer/path/allpaths.c | 467 ++++-
src/backend/optimizer/path/joinrels.c | 193 ++
src/backend/optimizer/plan/initsplan.c | 370 ++++
src/backend/optimizer/plan/planmain.c | 9 +
src/backend/optimizer/plan/planner.c | 124 +-
src/backend/optimizer/util/appendinfo.c | 51 +
src/backend/optimizer/util/relnode.c | 650 +++++++
src/backend/utils/misc/guc_parameters.dat | 16 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/include/nodes/pathnodes.h | 117 ++
src/include/optimizer/pathnode.h | 4 +
src/include/optimizer/paths.h | 4 +
src/include/optimizer/planmain.h | 1 +
.../regress/expected/collate.icu.utf8.out | 32 +-
src/test/regress/expected/eager_aggregate.out | 1714 +++++++++++++++++
src/test/regress/expected/join.out | 12 +-
.../regress/expected/partition_aggregate.out | 2 +
src/test/regress/expected/sysviews.out | 3 +-
src/test/regress/parallel_schedule | 2 +-
src/test/regress/sql/eager_aggregate.sql | 380 ++++
src/test/regress/sql/partition_aggregate.sql | 2 +
src/tools/pgindent/typedefs.list | 3 +
26 files changed, 4293 insertions(+), 76 deletions(-)
create mode 100644 src/test/regress/expected/eager_aggregate.out
create mode 100644 src/test/regress/sql/eager_aggregate.sql
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6dc04e916dc..f5a57b9cbd5 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -3701,30 +3701,33 @@ select count(t1.c3) from ft2 t1 left join ft2 t2 on (t1.c1 = random() * t2.c2);
-- Subquery in FROM clause having aggregate
explain (verbose, costs off)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
- QUERY PLAN
------------------------------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------------------
Sort
- Output: (count(*)), x.b
- Sort Key: (count(*)), x.b
- -> HashAggregate
- Output: count(*), x.b
- Group Key: x.b
- -> Hash Join
- Output: x.b
- Inner Unique: true
- Hash Cond: (ft1.c2 = x.a)
- -> Foreign Scan on public.ft1
- Output: ft1.c2
- Remote SQL: SELECT c2 FROM "S 1"."T 1"
- -> Hash
- Output: x.b, x.a
- -> Subquery Scan on x
- Output: x.b, x.a
- -> Foreign Scan
- Output: ft1_1.c2, (sum(ft1_1.c1))
- Relations: Aggregate on (public.ft1 ft1_1)
- Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
-(21 rows)
+ Output: (count(*)), (sum(ft1_1.c1))
+ Sort Key: (count(*)), (sum(ft1_1.c1))
+ -> Finalize GroupAggregate
+ Output: count(*), (sum(ft1_1.c1))
+ Group Key: (sum(ft1_1.c1))
+ -> Sort
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Sort Key: (sum(ft1_1.c1))
+ -> Hash Join
+ Output: (sum(ft1_1.c1)), (PARTIAL count(*))
+ Hash Cond: (ft1_1.c2 = ft1.c2)
+ -> Foreign Scan
+ Output: ft1_1.c2, (sum(ft1_1.c1))
+ Relations: Aggregate on (public.ft1 ft1_1)
+ Remote SQL: SELECT c2, sum("C 1") FROM "S 1"."T 1" GROUP BY 1
+ -> Hash
+ Output: ft1.c2, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: ft1.c2, PARTIAL count(*)
+ Group Key: ft1.c2
+ -> Foreign Scan on public.ft1
+ Output: ft1.c2
+ Remote SQL: SELECT c2 FROM "S 1"."T 1"
+(24 rows)
select count(*), x.b from ft1, (select c2 a, sum(c1) b from ft1 group by c2) x where ft1.c2 = x.a group by x.b order by 1, 2;
count | b
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e9b420f3ddb..39e658b7808 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5475,6 +5475,21 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-eager-aggregate" xreflabel="enable_eager_aggregate">
+ <term><varname>enable_eager_aggregate</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_eager_aggregate</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's ability to partially push
+ aggregation past a join, and finalize it once all the relations are
+ joined. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-gathermerge" xreflabel="enable_gathermerge">
<term><varname>enable_gathermerge</varname> (<type>boolean</type>)
<indexterm>
@@ -6095,6 +6110,22 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-min-eager-agg-group-size" xreflabel="min_eager_agg_group_size">
+ <term><varname>min_eager_agg_group_size</varname> (<type>floating point</type>)
+ <indexterm>
+ <primary><varname>min_eager_agg_group_size</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Sets the minimum average group size required to consider applying
+ eager aggregation. This helps avoid the overhead of eager
+ aggregation when it does not offer significant row count reduction.
+ The default is <literal>8</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-jit-above-cost" xreflabel="jit_above_cost">
<term><varname>jit_above_cost</varname> (<type>floating point</type>)
<indexterm>
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 843368096fd..6c35baceedb 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1500,3 +1500,113 @@ breaking down aggregation or grouping over a partitioned relation into
aggregation or grouping over its partitions is called partitionwise
aggregation. Especially when the partition keys match the GROUP BY clause,
this can be significantly faster than the regular method.
+
+Eager aggregation
+-----------------
+
+Eager aggregation is a query optimization technique that partially
+pushes aggregation past a join, and finalizes it once all the
+relations are joined. Eager aggregation may reduce the number of
+input rows to the join and thus could result in a better overall plan.
+
+To prove that the transformation is correct, let's first consider the
+case where only inner joins are involved. In this case, we partition
+the tables in the FROM clause into two groups: those that contain at
+least one aggregation column, and those that do not contain any
+aggregation columns. Each group can be treated as a single relation
+formed by the Cartesian product of the tables within that group.
+Therefore, without loss of generality, we can assume that the FROM
+clause contains exactly two relations, R1 and R2, where R1 represents
+the relation containing all aggregation columns, and R2 represents the
+relation without any aggregation columns.
+
+Let the query be of the form:
+
+SELECT G, AGG(A)
+FROM R1 JOIN R2 ON J
+GROUP BY G;
+
+where G is the set of grouping keys that may include columns from R1
+and/or R2; AGG(A) is an aggregate function over columns A from R1; J
+is the join condition between R1 and R2.
+
+The transformation of eager aggregation is:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 ON J)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 ON J)
+
+This equivalence holds under the following conditions:
+
+1) AGG is decomposable, meaning that it can be computed in two stages:
+a partial aggregation followed by a final aggregation;
+2) The set G1 used in the pre-aggregation of R1 includes:
+ * all columns from R1 that are part of the grouping keys G, and
+ * all columns from R1 that appear in the join condition J.
+3) The grouping operator for any column in G1 must be compatible with
+the operator used for that column in the join condition J.
+
+Since G1 includes all columns from R1 that appear in either the
+grouping keys G or the join condition J, all rows within each partial
+group have identical values for both the grouping keys and the
+join-relevant columns from R1, assuming compatible operators are used.
+As a result, the rows within a partial group are indistinguishable in
+terms of their contribution to the aggregation and their behavior in
+the join. This ensures that all rows in the same partial group share
+the same "destiny": they either all match or all fail to match a given
+row in R2. Because the aggregate function AGG is decomposable,
+aggregating the partial results after the join yields the same final
+result as aggregating after the full join, thereby preserving query
+semantics. Q.E.D.
+
+In the case where there are any outer joins, the situation becomes
+more complex due to join order constraints and the semantics of
+null-extension in outer joins. If the relations that contain at least
+one aggregation column cannot be treated as a single relation because
+of the join order constraints, partial aggregation paths will not be
+generated, and thus the transformation is not applicable. Otherwise,
+let R1 be the relation containing all aggregation columns, and R2, R3,
+... be the remaining relations. From the inner join case, under the
+aforementioned conditions, we have the equivalence:
+
+ GROUP BY G, AGG(A) on (R1 JOIN R2 JOIN R3 ...)
+ =
+ GROUP BY G, AGG(agg_A) on ((GROUP BY G1, AGG(A) AS agg_A on R1) JOIN R2 JOIN R3 ...)
+
+To preserve correctness when outer joins are involved, we require an
+additional condition:
+
+4) R1 must not be on the nullable side of any outer join.
+
+This condition ensures that partial aggregation over R1 does not
+suppress any null-extended rows that would be introduced by outer
+joins. If R1 is on the nullable side of an outer join, the
+NULL-extended rows produced by the outer join would not be available
+when we perform the partial aggregation, while with a
+non-eager-aggregation plan these rows are available for the top-level
+aggregation. Pushing partial aggregation in this case may result in
+the rows being grouped differently than expected, or produce incorrect
+values from the aggregate functions.
+
+During the construction of the join tree, we evaluate each base or
+join relation to determine if eager aggregation can be applied. If
+feasible, we create a separate RelOptInfo called a "grouped relation"
+and generate grouped paths by adding sorted and hashed partial
+aggregation paths on top of the non-grouped paths. To limit planning
+time, we consider only the cheapest or suitably-sorted non-grouped
+paths in this step.
+
+Another way to generate grouped paths is to join a grouped relation
+with a non-grouped relation. Joining two grouped relations is
+currently not supported.
+
+To further limit planning time, we currently adopt a strategy where
+partial aggregation is pushed only to the lowest feasible level in the
+join tree where it provides a significant reduction in row count.
+This strategy also helps ensure that all grouped paths for the same
+grouped relation produce the same set of rows, which is important to
+support a fundamental assumption of the planner.
+
+If we have generated a grouped relation for the topmost join relation,
+we need to finalize its paths at the end. The final paths will
+compete in the usual way with paths built from regular planning.
diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c
index f07d1dc8ac6..e39c5da63eb 100644
--- a/src/backend/optimizer/geqo/geqo_eval.c
+++ b/src/backend/optimizer/geqo/geqo_eval.c
@@ -264,6 +264,9 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
/* Keep searching if join order is not valid */
if (joinrel)
{
+ bool is_top_rel = bms_equal(joinrel->relids,
+ root->all_query_rels);
+
/* Create paths for partitionwise joins. */
generate_partitionwise_join_paths(root, joinrel);
@@ -273,12 +276,28 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene,
* rel once we know the final targetlist (see
* grouping_planner).
*/
- if (!bms_equal(joinrel->relids, root->all_query_rels))
+ if (!is_top_rel)
generate_useful_gather_paths(root, joinrel, false);
/* Find and save the cheapest paths for this joinrel */
set_cheapest(joinrel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top
+ * of the paths of this rel. After that, we're done creating
+ * paths for the grouped relation, so run set_cheapest().
+ */
+ if (joinrel->grouped_rel != NULL && !is_top_rel)
+ {
+ RelOptInfo *grouped_rel = joinrel->grouped_rel;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, joinrel);
+ set_cheapest(grouped_rel);
+ }
+
/* Absorb new clump into old */
old_clump->joinrel = joinrel;
old_clump->size += new_clump->size;
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d7ff36d89be..cc562518b04 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -40,6 +40,7 @@
#include "optimizer/paths.h"
#include "optimizer/plancat.h"
#include "optimizer/planner.h"
+#include "optimizer/prep.h"
#include "optimizer/tlist.h"
#include "parser/parse_clause.h"
#include "parser/parsetree.h"
@@ -47,6 +48,7 @@
#include "port/pg_bitutils.h"
#include "rewrite/rewriteManip.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
/* Bitmask flags for pushdown_safety_info.unsafeFlags */
@@ -77,7 +79,9 @@ typedef enum pushdown_safe_type
/* These parameters are set by GUC */
bool enable_geqo = false; /* just in case GUC doesn't set it */
+bool enable_eager_aggregate = true;
int geqo_threshold;
+double min_eager_agg_group_size;
int min_parallel_table_scan_size;
int min_parallel_index_scan_size;
@@ -90,6 +94,7 @@ join_search_hook_type join_search_hook = NULL;
static void set_base_rel_consider_startup(PlannerInfo *root);
static void set_base_rel_sizes(PlannerInfo *root);
+static void setup_simple_grouped_rels(PlannerInfo *root);
static void set_base_rel_pathlists(PlannerInfo *root);
static void set_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
@@ -114,6 +119,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
Index rti, RangeTblEntry *rte);
+static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel);
static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel,
List *live_childrels,
List *all_child_pathkeys);
@@ -182,6 +188,12 @@ make_one_rel(PlannerInfo *root, List *joinlist)
*/
set_base_rel_sizes(root);
+ /*
+ * Build grouped relations for simple rels (i.e., base or "other" member
+ * relations) where possible.
+ */
+ setup_simple_grouped_rels(root);
+
/*
* We should now have size estimates for every actual table involved in
* the query, and we also know which if any have been deleted from the
@@ -323,6 +335,39 @@ set_base_rel_sizes(PlannerInfo *root)
}
}
+/*
+ * setup_simple_grouped_rels
+ * For each simple relation, build a grouped simple relation if eager
+ * aggregation is possible and if this relation can produce grouped paths.
+ */
+static void
+setup_simple_grouped_rels(PlannerInfo *root)
+{
+ Index rti;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ for (rti = 1; rti < root->simple_rel_array_size; rti++)
+ {
+ RelOptInfo *rel = root->simple_rel_array[rti];
+
+ /* there may be empty slots corresponding to non-baserel RTEs */
+ if (rel == NULL)
+ continue;
+
+ Assert(rel->relid == rti); /* sanity check on array */
+ Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */
+
+ (void) build_simple_grouped_rel(root, rel);
+ }
+}
+
/*
* set_base_rel_pathlists
* Finds all paths available for scanning each base-relation entry.
@@ -559,6 +604,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
/* Now find the cheapest of the paths for this rel */
set_cheapest(rel);
+ /*
+ * If a grouped relation for this rel exists, build partial aggregation
+ * paths for it.
+ *
+ * Note that this can only happen after we've called set_cheapest() for
+ * this base rel, because we need its cheapest paths.
+ */
+ set_grouped_rel_pathlist(root, rel);
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -1305,6 +1359,35 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel,
add_paths_to_append_rel(root, rel, live_childrels);
}
+/*
+ * set_grouped_rel_pathlist
+ * If a grouped relation for the given 'rel' exists, build partial
+ * aggregation paths for it.
+ */
+static void
+set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Add paths to the grouped base relation if one exists. */
+ grouped_rel = rel->grouped_rel;
+ if (grouped_rel)
+ {
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel);
+ set_cheapest(grouped_rel);
+ }
+}
+
/*
* add_paths_to_append_rel
@@ -3332,6 +3415,345 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r
}
}
+/*
+ * generate_grouped_paths
+ * Generate paths for a grouped relation by adding sorted and hashed
+ * partial aggregation paths on top of paths of the ungrouped relation.
+ *
+ * The information needed is provided by the RelAggInfo structure stored in
+ * "grouped_rel".
+ */
+void
+generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel)
+{
+ RelAggInfo *agg_info = grouped_rel->agg_info;
+ AggClauseCosts agg_costs;
+ bool can_hash;
+ bool can_sort;
+ Path *cheapest_total_path = NULL;
+ Path *cheapest_partial_path = NULL;
+ double dNumGroups = 0;
+ double dNumPartialGroups = 0;
+ List *group_pathkeys = NIL;
+
+ if (IS_DUMMY_REL(rel))
+ {
+ mark_dummy_rel(grouped_rel);
+ return;
+ }
+
+ /*
+ * We push partial aggregation only to the lowest possible level in the
+ * join tree that is deemed useful.
+ */
+ if (!bms_equal(agg_info->apply_at, rel->relids) ||
+ !agg_info->agg_useful)
+ return;
+
+ MemSet(&agg_costs, 0, sizeof(AggClauseCosts));
+ get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs);
+
+ /*
+ * Determine whether it's possible to perform sort-based implementations
+ * of grouping, and generate the pathkeys that represent the grouping
+ * requirements in that case.
+ */
+ can_sort = grouping_is_sortable(agg_info->group_clauses);
+ if (can_sort)
+ {
+ RelOptInfo *top_grouped_rel;
+ List *top_group_tlist;
+
+ top_grouped_rel = IS_OTHER_REL(rel) ?
+ rel->top_parent->grouped_rel : grouped_rel;
+ top_group_tlist =
+ make_tlist_from_pathtarget(top_grouped_rel->agg_info->target);
+
+ group_pathkeys =
+ make_pathkeys_for_sortclauses(root, agg_info->group_clauses,
+ top_group_tlist);
+ }
+
+ /*
+ * Determine whether we should consider hash-based implementations of
+ * grouping.
+ */
+ Assert(root->numOrderedAggs == 0);
+ can_hash = (agg_info->group_clauses != NIL &&
+ grouping_is_hashable(agg_info->group_clauses));
+
+ /*
+ * Consider whether we should generate partially aggregated non-partial
+ * paths. We can only do this if we have a non-partial path.
+ */
+ if (rel->pathlist != NIL)
+ {
+ cheapest_total_path = rel->cheapest_total_path;
+ Assert(cheapest_total_path != NULL);
+ }
+
+ /*
+ * If parallelism is possible for grouped_rel, then we should consider
+ * generating partially-grouped partial paths. However, if the ungrouped
+ * rel has no partial paths, then we can't.
+ */
+ if (grouped_rel->consider_parallel && rel->partial_pathlist != NIL)
+ {
+ cheapest_partial_path = linitial(rel->partial_pathlist);
+ Assert(cheapest_partial_path != NULL);
+ }
+
+ /* Estimate number of partial groups. */
+ if (cheapest_total_path != NULL)
+ dNumGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_total_path->rows,
+ NULL, NULL);
+ if (cheapest_partial_path != NULL)
+ dNumPartialGroups = estimate_num_groups(root,
+ agg_info->group_exprs,
+ cheapest_partial_path->rows,
+ NULL, NULL);
+
+ if (can_sort && cheapest_total_path != NULL)
+ {
+ ListCell *lc;
+
+ /*
+ * Use any available suitably-sorted path as input, and also consider
+ * sorting the cheapest-total path and incremental sort on any paths
+ * with presorted keys.
+ *
+ * To save planning time, we ignore parameterized input paths unless
+ * they are the cheapest-total path.
+ */
+ foreach(lc, rel->pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ /*
+ * Ignore parameterized paths that are not the cheapest-total
+ * path.
+ */
+ if (input_path->param_info &&
+ input_path != cheapest_total_path)
+ continue;
+
+ is_sorted = pathkeys_count_contained_in(group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest total path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_total_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+ }
+
+ if (can_sort && cheapest_partial_path != NULL)
+ {
+ ListCell *lc;
+
+ /* Similar to above logic, but for partial paths. */
+ foreach(lc, rel->partial_pathlist)
+ {
+ Path *input_path = (Path *) lfirst(lc);
+ Path *path;
+ bool is_sorted;
+ int presorted_keys;
+
+ is_sorted = pathkeys_count_contained_in(group_pathkeys,
+ input_path->pathkeys,
+ &presorted_keys);
+
+ /*
+ * Ignore paths that are not suitably or partially sorted, unless
+ * they are the cheapest partial path (no need to deal with paths
+ * which have presorted keys when incremental sort is disabled).
+ */
+ if (!is_sorted && input_path != cheapest_partial_path &&
+ (presorted_keys == 0 || !enable_incremental_sort))
+ continue;
+
+ /*
+ * Since the path originates from a non-grouped relation that is
+ * not aware of eager aggregation, we must ensure that it provides
+ * the correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ input_path,
+ agg_info->agg_input);
+
+ if (!is_sorted)
+ {
+ /*
+ * We've no need to consider both a sort and incremental sort.
+ * We'll just do a sort if there are no presorted keys and an
+ * incremental sort when there are presorted keys.
+ */
+ if (presorted_keys == 0 || !enable_incremental_sort)
+ path = (Path *) create_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ -1.0);
+ else
+ path = (Path *) create_incremental_sort_path(root,
+ grouped_rel,
+ path,
+ group_pathkeys,
+ presorted_keys,
+ -1.0);
+ }
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until
+ * the final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_SORTED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+ }
+
+ /*
+ * Add a partially-grouped HashAgg Path where possible
+ */
+ if (can_hash && cheapest_total_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_total_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumGroups);
+
+ add_path(grouped_rel, path);
+ }
+
+ /*
+ * Now add a partially-grouped HashAgg partial Path where possible
+ */
+ if (can_hash && cheapest_partial_path != NULL)
+ {
+ Path *path;
+
+ /*
+ * Since the path originates from a non-grouped relation that is not
+ * aware of eager aggregation, we must ensure that it provides the
+ * correct input for partial aggregation.
+ */
+ path = (Path *) create_projection_path(root,
+ grouped_rel,
+ cheapest_partial_path,
+ agg_info->agg_input);
+
+ /*
+ * qual is NIL because the HAVING clause cannot be evaluated until the
+ * final value of the aggregate is known.
+ */
+ path = (Path *) create_agg_path(root,
+ grouped_rel,
+ path,
+ agg_info->target,
+ AGG_HASHED,
+ AGGSPLIT_INITIAL_SERIAL,
+ agg_info->group_clauses,
+ NIL,
+ &agg_costs,
+ dNumPartialGroups);
+
+ add_partial_path(grouped_rel, path);
+ }
+}
+
/*
* make_rel_from_joinlist
* Build access paths using a "joinlist" to guide the join path search.
@@ -3491,11 +3913,19 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
*
* After that, we're done creating paths for the joinrel, so run
* set_cheapest().
+ *
+ * In addition, we also run generate_grouped_paths() for the grouped
+ * relation of each just-processed joinrel, and run set_cheapest() for
+ * the grouped relation afterwards.
*/
foreach(lc, root->join_rel_level[lev])
{
+ bool is_top_rel;
+
rel = (RelOptInfo *) lfirst(lc);
+ is_top_rel = bms_equal(rel->relids, root->all_query_rels);
+
/* Create paths for partitionwise joins. */
generate_partitionwise_join_paths(root, rel);
@@ -3505,12 +3935,28 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
* once we know the final targetlist (see grouping_planner's and
* its call to apply_scanjoin_target_to_paths).
*/
- if (!bms_equal(rel->relids, root->all_query_rels))
+ if (!is_top_rel)
generate_useful_gather_paths(root, rel, false);
/* Find and save the cheapest paths for this rel */
set_cheapest(rel);
+ /*
+ * Except for the topmost scan/join rel, consider generating
+ * partial aggregation paths for the grouped relation on top of
+ * the paths of this rel. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (rel->grouped_rel != NULL && !is_top_rel)
+ {
+ RelOptInfo *grouped_rel = rel->grouped_rel;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, rel);
+ set_cheapest(grouped_rel);
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(rel);
#endif
@@ -4380,6 +4826,25 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel)
if (IS_DUMMY_REL(child_rel))
continue;
+ /*
+ * Except for the topmost scan/join rel, consider generating partial
+ * aggregation paths for the grouped relation on top of the paths of
+ * this partitioned child-join. After that, we're done creating paths
+ * for the grouped relation, so run set_cheapest().
+ */
+ if (child_rel->grouped_rel != NULL &&
+ !bms_equal(IS_OTHER_REL(rel) ?
+ rel->top_parent_relids : rel->relids,
+ root->all_query_rels))
+ {
+ RelOptInfo *grouped_rel = child_rel->grouped_rel;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ generate_grouped_paths(root, grouped_rel, child_rel);
+ set_cheapest(grouped_rel);
+ }
+
#ifdef OPTIMIZER_DEBUG
pprint(child_rel);
#endif
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 535248aa525..43b84d239ed 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -16,6 +16,7 @@
#include "miscadmin.h"
#include "optimizer/appendinfo.h"
+#include "optimizer/cost.h"
#include "optimizer/joininfo.h"
#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
@@ -36,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel);
static bool restriction_is_constant_false(List *restrictlist,
RelOptInfo *joinrel,
bool only_pushed_down);
+static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist);
static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1,
RelOptInfo *rel2, RelOptInfo *joinrel,
SpecialJoinInfo *sjinfo, List *restrictlist);
@@ -762,6 +766,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2)
return joinrel;
}
+ /* Build a grouped join relation for 'joinrel' if possible. */
+ make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo,
+ restrictlist);
+
/* Add paths to the join relation. */
populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo,
restrictlist);
@@ -873,6 +881,186 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids,
return input_relids;
}
+/*
+ * make_grouped_join_rel
+ * Build a grouped join relation for the given "joinrel" if eager
+ * aggregation is applicable and the resulting grouped paths are considered
+ * useful.
+ *
+ * There are two strategies for generating grouped paths for a join relation:
+ *
+ * 1. Join a grouped (partially aggregated) input relation with a non-grouped
+ * input (e.g., AGG(B) JOIN A).
+ *
+ * 2. Apply partial aggregation (sorted or hashed) on top of existing
+ * non-grouped join paths (e.g., AGG(A JOIN B)).
+ *
+ * To limit planning effort and avoid an explosion of alternatives, we adopt a
+ * strategy where partial aggregation is only pushed to the lowest possible
+ * level in the join tree that is deemed useful. That is, if grouped paths can
+ * be built using the first strategy, we skip consideration of the second
+ * strategy for the same join level.
+ *
+ * Additionally, if there are multiple lowest useful levels where partial
+ * aggregation could be applied, such as in a join tree with relations A, B,
+ * and C where both "AGG(A JOIN B) JOIN C" and "A JOIN AGG(B JOIN C)" are valid
+ * placements, we choose only the first one encountered during join search.
+ * This avoids generating multiple versions of the same grouped relation based
+ * on different aggregation placements.
+ *
+ * These heuristics also ensure that all grouped paths for the same grouped
+ * relation produce the same set of rows, which is a basic assumption in the
+ * planner.
+ */
+static void
+make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1,
+ RelOptInfo *rel2, RelOptInfo *joinrel,
+ SpecialJoinInfo *sjinfo, List *restrictlist)
+{
+ RelOptInfo *grouped_rel;
+ RelOptInfo *grouped_rel1;
+ RelOptInfo *grouped_rel2;
+ bool rel1_empty;
+ bool rel2_empty;
+ Relids agg_apply_at;
+
+ /*
+ * If there are no aggregate expressions or grouping expressions, eager
+ * aggregation is not possible.
+ */
+ if (root->agg_clause_list == NIL ||
+ root->group_expr_list == NIL)
+ return;
+
+ /* Retrieve the grouped relations for the two input rels */
+ grouped_rel1 = rel1->grouped_rel;
+ grouped_rel2 = rel2->grouped_rel;
+
+ rel1_empty = (grouped_rel1 == NULL || IS_DUMMY_REL(grouped_rel1));
+ rel2_empty = (grouped_rel2 == NULL || IS_DUMMY_REL(grouped_rel2));
+
+ /* Find or construct a grouped joinrel for this joinrel */
+ grouped_rel = joinrel->grouped_rel;
+ if (grouped_rel == NULL)
+ {
+ RelAggInfo *agg_info = NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this
+ * join relation.
+ */
+ agg_info = create_rel_agg_info(root, joinrel, rel1_empty == rel2_empty);
+ if (agg_info == NULL)
+ return;
+
+ /*
+ * If grouped paths for the given join relation are not considered
+ * useful, and no grouped paths can be built by joining grouped input
+ * relations, skip building the grouped join relation.
+ */
+ if (!agg_info->agg_useful &&
+ (rel1_empty == rel2_empty))
+ return;
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, joinrel);
+ grouped_rel->reltarget = agg_info->target;
+
+ if (rel1_empty != rel2_empty)
+ {
+ /*
+ * If there is exactly one grouped input relation, then we can
+ * build grouped paths by joining the input relations. Set size
+ * estimates for the grouped join relation based on the input
+ * relations, and update the set of relids where partial
+ * aggregation is applied to that of the grouped input relation.
+ */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ agg_info->apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+ }
+ else
+ {
+ /*
+ * Otherwise, grouped paths can be built by applying partial
+ * aggregation on top of existing non-grouped join paths. Set
+ * size estimates for the grouped join relation based on the
+ * estimated number of groups, and track the set of relids where
+ * partial aggregation is applied. Note that these values may be
+ * updated later if it is determined that grouped paths can be
+ * constructed by joining other input relations.
+ */
+ grouped_rel->rows = agg_info->grouped_rows;
+ agg_info->apply_at = bms_copy(joinrel->relids);
+ }
+
+ grouped_rel->agg_info = agg_info;
+ joinrel->grouped_rel = grouped_rel;
+ }
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* We may have already proven this grouped join relation to be dummy. */
+ if (IS_DUMMY_REL(grouped_rel))
+ return;
+
+ /*
+ * Nothing to do if there's no grouped input relation. Also, joining two
+ * grouped relations is not currently supported.
+ */
+ if (rel1_empty == rel2_empty)
+ return;
+
+ /*
+ * Get the set of relids where partial aggregation is applied among the
+ * given input relations.
+ */
+ agg_apply_at = rel1_empty ?
+ grouped_rel2->agg_info->apply_at :
+ grouped_rel1->agg_info->apply_at;
+
+ /*
+ * If it's not the designated level, skip building grouped paths.
+ *
+ * One exception is when it is a subset of the previously recorded level.
+ * In that case, we need to update the designated level to this one, and
+ * adjust the size estimates for the grouped join relation accordingly.
+ * For example, suppose partial aggregation can be applied on top of (B
+ * JOIN C). If we first construct the join as ((A JOIN B) JOIN C), we'd
+ * record the designated level as including all three relations (A B C).
+ * Later, when we consider (A JOIN (B JOIN C)), we encounter the smaller
+ * (B C) join level directly. Since this is a subset of the previous
+ * level and still valid for partial aggregation, we update the designated
+ * level to (B C), and adjust the size estimates accordingly.
+ */
+ if (!bms_equal(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ if (bms_is_subset(agg_apply_at, grouped_rel->agg_info->apply_at))
+ {
+ /* Adjust the size estimates for the grouped join relation. */
+ set_joinrel_size_estimates(root, grouped_rel,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ sjinfo, restrictlist);
+ grouped_rel->agg_info->apply_at = agg_apply_at;
+ }
+ else
+ return;
+ }
+
+ /* Make paths for the grouped join relation. */
+ populate_joinrel_with_paths(root,
+ rel1_empty ? rel1 : grouped_rel1,
+ rel2_empty ? rel2 : grouped_rel2,
+ grouped_rel,
+ sjinfo,
+ restrictlist);
+}
+
/*
* populate_joinrel_with_paths
* Add paths to the given joinrel for given pair of joining relations. The
@@ -1615,6 +1803,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
adjust_child_relids(joinrel->relids,
nappinfos, appinfos)));
+ /* Build a grouped join relation for 'child_joinrel' if possible */
+ make_grouped_join_rel(root, child_rel1, child_rel2,
+ child_joinrel, child_sjinfo,
+ child_restrictlist);
+
/* And make paths for the child join */
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index 3e3fec89252..b8d1c7e88a3 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -14,6 +14,7 @@
*/
#include "postgres.h"
+#include "access/nbtree.h"
#include "catalog/pg_constraint.h"
#include "catalog/pg_type.h"
#include "nodes/makefuncs.h"
@@ -31,6 +32,7 @@
#include "optimizer/restrictinfo.h"
#include "parser/analyze.h"
#include "rewrite/rewriteManip.h"
+#include "utils/fmgroids.h"
#include "utils/lsyscache.h"
#include "utils/rel.h"
#include "utils/typcache.h"
@@ -81,6 +83,12 @@ typedef struct JoinTreeItem
} JoinTreeItem;
+static bool is_partial_agg_memory_risky(PlannerInfo *root);
+static void create_agg_clause_infos(PlannerInfo *root);
+static void create_grouping_expr_infos(PlannerInfo *root);
+static EquivalenceClass *get_eclass_for_sortgroupclause(PlannerInfo *root,
+ SortGroupClause *sgc,
+ Expr *expr);
static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel,
Index rtindex);
static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode,
@@ -628,6 +636,368 @@ remove_useless_groupby_columns(PlannerInfo *root)
}
}
+/*
+ * setup_eager_aggregation
+ * Check if eager aggregation is applicable, and if so collect suitable
+ * aggregate expressions and grouping expressions in the query.
+ */
+void
+setup_eager_aggregation(PlannerInfo *root)
+{
+ /*
+ * Don't apply eager aggregation if disabled by user.
+ */
+ if (!enable_eager_aggregate)
+ return;
+
+ /*
+ * Don't apply eager aggregation if there are no available GROUP BY
+ * clauses.
+ */
+ if (!root->processed_groupClause)
+ return;
+
+ /*
+ * For now we don't try to support grouping sets.
+ */
+ if (root->parse->groupingSets)
+ return;
+
+ /*
+ * For now we don't try to support DISTINCT or ORDER BY aggregates.
+ */
+ if (root->numOrderedAggs > 0)
+ return;
+
+ /*
+ * If there are any aggregates that do not support partial mode, or any
+ * partial aggregates that are non-serializable, do not apply eager
+ * aggregation.
+ */
+ if (root->hasNonPartialAggs || root->hasNonSerialAggs)
+ return;
+
+ /*
+ * We don't try to apply eager aggregation if there are set-returning
+ * functions in targetlist.
+ */
+ if (root->parse->hasTargetSRFs)
+ return;
+
+ /*
+ * Eager aggregation only makes sense if there are multiple base rels in
+ * the query.
+ */
+ if (bms_membership(root->all_baserels) != BMS_MULTIPLE)
+ return;
+
+ /*
+ * Don't apply eager aggregation if any aggregate poses a risk of
+ * excessive memory usage during partial aggregation.
+ */
+ if (is_partial_agg_memory_risky(root))
+ return;
+
+ /*
+ * Collect aggregate expressions and plain Vars that appear in the
+ * targetlist and havingQual.
+ */
+ create_agg_clause_infos(root);
+
+ /*
+ * If there are no suitable aggregate expressions, we cannot apply eager
+ * aggregation.
+ */
+ if (root->agg_clause_list == NIL)
+ return;
+
+ /*
+ * Collect grouping expressions that appear in grouping clauses.
+ */
+ create_grouping_expr_infos(root);
+}
+
+/*
+ * is_partial_agg_memory_risky
+ * Check if any aggregate poses a risk of excessive memory usage during
+ * partial aggregation.
+ *
+ * We check if any aggregate has a negative aggtransspace value, which
+ * indicates that its transition state data can grow unboundedly in size.
+ * Applying eager aggregation in such cases risks high memory usage since
+ * partial aggregation results might be stored in join hash tables or
+ * materialized nodes.
+ */
+static bool
+is_partial_agg_memory_risky(PlannerInfo *root)
+{
+ ListCell *lc;
+
+ foreach(lc, root->aggtransinfos)
+ {
+ AggTransInfo *transinfo = lfirst_node(AggTransInfo, lc);
+
+ if (transinfo->aggtransspace < 0)
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * create_agg_clause_infos
+ * Search the targetlist and havingQual for Aggrefs and plain Vars, and
+ * create an AggClauseInfo for each Aggref node.
+ */
+static void
+create_agg_clause_infos(PlannerInfo *root)
+{
+ List *tlist_exprs;
+ List *agg_clause_list = NIL;
+ List *tlist_vars = NIL;
+ Relids aggregate_relids = NULL;
+ bool eager_agg_applicable = true;
+ ListCell *lc;
+
+ Assert(root->agg_clause_list == NIL);
+ Assert(root->tlist_vars == NIL);
+
+ tlist_exprs = pull_var_clause((Node *) root->processed_tlist,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ /*
+ * Aggregates within the HAVING clause need to be processed in the same
+ * way as those in the targetlist. Note that HAVING can contain Aggrefs
+ * but not WindowFuncs.
+ */
+ if (root->parse->havingQual != NULL)
+ {
+ List *having_exprs;
+
+ having_exprs = pull_var_clause((Node *) root->parse->havingQual,
+ PVC_INCLUDE_AGGREGATES |
+ PVC_RECURSE_PLACEHOLDERS);
+ if (having_exprs != NIL)
+ {
+ tlist_exprs = list_concat(tlist_exprs, having_exprs);
+ list_free(having_exprs);
+ }
+ }
+
+ foreach(lc, tlist_exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Aggref *aggref;
+ Relids agg_eval_at;
+ AggClauseInfo *ac_info;
+
+ /* For now we don't try to support GROUPING() expressions */
+ if (IsA(expr, GroupingFunc))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* Collect plain Vars for future reference */
+ if (IsA(expr, Var))
+ {
+ tlist_vars = list_append_unique(tlist_vars, expr);
+ continue;
+ }
+
+ aggref = castNode(Aggref, expr);
+
+ Assert(aggref->aggorder == NIL);
+ Assert(aggref->aggdistinct == NIL);
+
+ /*
+ * If there are any securityQuals, do not try to apply eager
+ * aggregation if any non-leakproof aggregate functions are present.
+ * This is overly strict, but for now...
+ */
+ if (root->qual_security_level > 0 &&
+ !get_func_leakproof(aggref->aggfnoid))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ agg_eval_at = pull_varnos(root, (Node *) aggref);
+
+ /*
+ * If all base relations in the query are referenced by aggregate
+ * functions, then eager aggregation is not applicable.
+ */
+ aggregate_relids = bms_add_members(aggregate_relids, agg_eval_at);
+ if (bms_is_subset(root->all_baserels, aggregate_relids))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
+ /* OK, create the AggClauseInfo node */
+ ac_info = makeNode(AggClauseInfo);
+ ac_info->aggref = aggref;
+ ac_info->agg_eval_at = agg_eval_at;
+
+ /* ... and add it to the list */
+ agg_clause_list = list_append_unique(agg_clause_list, ac_info);
+ }
+
+ list_free(tlist_exprs);
+
+ if (eager_agg_applicable)
+ {
+ root->agg_clause_list = agg_clause_list;
+ root->tlist_vars = tlist_vars;
+ }
+ else
+ {
+ list_free_deep(agg_clause_list);
+ list_free(tlist_vars);
+ }
+}
+
+/*
+ * create_grouping_expr_infos
+ * Create a GroupingExprInfo for each expression usable as grouping key.
+ *
+ * If any grouping expression is not suitable, we will just return with
+ * root->group_expr_list being NIL.
+ */
+static void
+create_grouping_expr_infos(PlannerInfo *root)
+{
+ List *exprs = NIL;
+ List *sortgrouprefs = NIL;
+ List *ecs = NIL;
+ ListCell *lc,
+ *lc1,
+ *lc2,
+ *lc3;
+
+ Assert(root->group_expr_list == NIL);
+
+ foreach(lc, root->processed_groupClause)
+ {
+ SortGroupClause *sgc = lfirst_node(SortGroupClause, lc);
+ TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist);
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ Assert(tle->ressortgroupref > 0);
+
+ /*
+ * For now we only support plain Vars as grouping expressions.
+ */
+ if (!IsA(tle->expr, Var))
+ return;
+
+ /*
+ * Eager aggregation is only possible if equality implies image
+ * equality for each grouping key. Otherwise, placing keys with
+ * different byte images into the same group may result in the loss of
+ * information that could be necessary to evaluate upper qual clauses.
+ *
+ * For instance, the NUMERIC data type is not supported, as values
+ * that are considered equal by the equality operator (e.g., 0 and
+ * 0.0) can have different scales.
+ */
+ tce = lookup_type_cache(exprType((Node *) tle->expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return;
+
+ exprs = lappend(exprs, tle->expr);
+ sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref);
+ ecs = lappend(ecs, get_eclass_for_sortgroupclause(root, sgc, tle->expr));
+ }
+
+ /*
+ * Construct a GroupingExprInfo for each expression.
+ */
+ forthree(lc1, exprs, lc2, sortgrouprefs, lc3, ecs)
+ {
+ Expr *expr = (Expr *) lfirst(lc1);
+ int sortgroupref = lfirst_int(lc2);
+ EquivalenceClass *ec = (EquivalenceClass *) lfirst(lc3);
+ GroupingExprInfo *ge_info;
+
+ ge_info = makeNode(GroupingExprInfo);
+ ge_info->expr = (Expr *) copyObject(expr);
+ ge_info->sortgroupref = sortgroupref;
+ ge_info->ec = ec;
+
+ root->group_expr_list = lappend(root->group_expr_list, ge_info);
+ }
+}
+
+/*
+ * get_eclass_for_sortgroupclause
+ * Given a group clause and an expression, find an existing equivalence
+ * class that the expression is a member of; return NULL if none.
+ */
+static EquivalenceClass *
+get_eclass_for_sortgroupclause(PlannerInfo *root, SortGroupClause *sgc,
+ Expr *expr)
+{
+ Oid opfamily,
+ opcintype,
+ collation;
+ CompareType cmptype;
+ Oid equality_op;
+ List *opfamilies;
+
+ /* Punt if the group clause is not sortable */
+ if (!OidIsValid(sgc->sortop))
+ return NULL;
+
+ /* Find the operator in pg_amop --- failure shouldn't happen */
+ if (!get_ordering_op_properties(sgc->sortop,
+ &opfamily, &opcintype, &cmptype))
+ elog(ERROR, "operator %u is not a valid ordering operator",
+ sgc->sortop);
+
+ /* Because SortGroupClause doesn't carry collation, consult the expr */
+ collation = exprCollation((Node *) expr);
+
+ /*
+ * EquivalenceClasses need to contain opfamily lists based on the family
+ * membership of mergejoinable equality operators, which could belong to
+ * more than one opfamily. So we have to look up the opfamily's equality
+ * operator and get its membership.
+ */
+ equality_op = get_opfamily_member_for_cmptype(opfamily,
+ opcintype,
+ opcintype,
+ COMPARE_EQ);
+ if (!OidIsValid(equality_op)) /* shouldn't happen */
+ elog(ERROR, "missing operator %d(%u,%u) in opfamily %u",
+ COMPARE_EQ, opcintype, opcintype, opfamily);
+ opfamilies = get_mergejoin_opfamilies(equality_op);
+ if (!opfamilies) /* certainly should find some */
+ elog(ERROR, "could not find opfamilies for equality operator %u",
+ equality_op);
+
+ /* Now find a matching EquivalenceClass */
+ return get_eclass_for_sort_expr(root, expr, opfamilies, opcintype,
+ collation, sgc->tleSortGroupRef,
+ NULL, false);
+}
+
/*****************************************************************************
*
* LATERAL REFERENCES
diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c
index 5467e094ca7..eefc486a566 100644
--- a/src/backend/optimizer/plan/planmain.c
+++ b/src/backend/optimizer/plan/planmain.c
@@ -76,6 +76,9 @@ query_planner(PlannerInfo *root,
root->placeholder_list = NIL;
root->placeholder_array = NULL;
root->placeholder_array_size = 0;
+ root->agg_clause_list = NIL;
+ root->group_expr_list = NIL;
+ root->tlist_vars = NIL;
root->fkey_list = NIL;
root->initial_rels = NIL;
@@ -265,6 +268,12 @@ query_planner(PlannerInfo *root,
*/
extract_restriction_or_clauses(root);
+ /*
+ * Check if eager aggregation is applicable, and if so, set up
+ * root->agg_clause_list and root->group_expr_list.
+ */
+ setup_eager_aggregation(root);
+
/*
* Now expand appendrels by adding "otherrels" for their children. We
* delay this to the end so that we have as much information as possible
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 41bd8353430..462c5335589 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -232,7 +232,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
grouping_sets_data *gd,
- double dNumGroups,
GroupPathExtraData *extra);
static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root,
RelOptInfo *grouped_rel,
@@ -4010,9 +4009,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
GroupPathExtraData *extra,
RelOptInfo **partially_grouped_rel_p)
{
- Path *cheapest_path = input_rel->cheapest_total_path;
RelOptInfo *partially_grouped_rel = NULL;
- double dNumGroups;
PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE;
/*
@@ -4094,23 +4091,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel,
/* Gather any partially grouped partial paths. */
if (partially_grouped_rel && partially_grouped_rel->partial_pathlist)
- {
gather_grouping_paths(root, partially_grouped_rel);
- set_cheapest(partially_grouped_rel);
- }
- /*
- * Estimate number of groups.
- */
- dNumGroups = get_number_of_groups(root,
- cheapest_path->rows,
- gd,
- extra->targetList);
+ /* Now choose the best path(s) for partially_grouped_rel. */
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ set_cheapest(partially_grouped_rel);
/* Build final grouping paths */
add_paths_to_grouping_rel(root, input_rel, grouped_rel,
partially_grouped_rel, agg_costs, gd,
- dNumGroups, extra);
+ extra);
/* Give a helpful error if we failed to find any implementation */
if (grouped_rel->pathlist == NIL)
@@ -7055,16 +7045,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
RelOptInfo *grouped_rel,
RelOptInfo *partially_grouped_rel,
const AggClauseCosts *agg_costs,
- grouping_sets_data *gd, double dNumGroups,
+ grouping_sets_data *gd,
GroupPathExtraData *extra)
{
Query *parse = root->parse;
Path *cheapest_path = input_rel->cheapest_total_path;
+ Path *cheapest_partially_grouped_path = NULL;
ListCell *lc;
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
List *havingQual = (List *) extra->havingQual;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
+ double dNumGroups = 0;
+ double dNumFinalGroups = 0;
+
+ /*
+ * Estimate number of groups for non-split aggregation.
+ */
+ dNumGroups = get_number_of_groups(root,
+ cheapest_path->rows,
+ gd,
+ extra->targetList);
+
+ if (partially_grouped_rel && partially_grouped_rel->pathlist)
+ {
+ cheapest_partially_grouped_path =
+ partially_grouped_rel->cheapest_total_path;
+
+ /*
+ * Estimate number of groups for final phase of partial aggregation.
+ */
+ dNumFinalGroups =
+ get_number_of_groups(root,
+ cheapest_partially_grouped_path->rows,
+ gd,
+ extra->targetList);
+ }
if (can_sort)
{
@@ -7177,7 +7193,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path = make_ordered_path(root,
grouped_rel,
path,
- partially_grouped_rel->cheapest_total_path,
+ cheapest_partially_grouped_path,
info->pathkeys,
-1.0);
@@ -7195,7 +7211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
info->clauses,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
else
add_path(grouped_rel, (Path *)
create_group_path(root,
@@ -7203,7 +7219,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
path,
info->clauses,
havingQual,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7245,19 +7261,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel,
*/
if (partially_grouped_rel && partially_grouped_rel->pathlist)
{
- Path *path = partially_grouped_rel->cheapest_total_path;
-
add_path(grouped_rel, (Path *)
create_agg_path(root,
grouped_rel,
- path,
+ cheapest_partially_grouped_path,
grouped_rel->reltarget,
AGG_HASHED,
AGGSPLIT_FINAL_DESERIAL,
root->processed_groupClause,
havingQual,
agg_final_costs,
- dNumGroups));
+ dNumFinalGroups));
}
}
@@ -7297,6 +7311,7 @@ create_partial_grouping_paths(PlannerInfo *root,
{
Query *parse = root->parse;
RelOptInfo *partially_grouped_rel;
+ RelOptInfo *eager_agg_rel = NULL;
AggClauseCosts *agg_partial_costs = &extra->agg_partial_costs;
AggClauseCosts *agg_final_costs = &extra->agg_final_costs;
Path *cheapest_partial_path = NULL;
@@ -7307,6 +7322,15 @@ create_partial_grouping_paths(PlannerInfo *root,
bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0;
bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0;
+ /*
+ * Check whether any partially aggregated paths have been generated
+ * through eager aggregation.
+ */
+ if (input_rel->grouped_rel &&
+ !IS_DUMMY_REL(input_rel->grouped_rel) &&
+ input_rel->grouped_rel->pathlist != NIL)
+ eager_agg_rel = input_rel->grouped_rel;
+
/*
* Consider whether we should generate partially aggregated non-partial
* paths. We can only do this if we have a non-partial path, and only if
@@ -7328,11 +7352,13 @@ create_partial_grouping_paths(PlannerInfo *root,
/*
* If we can't partially aggregate partial paths, and we can't partially
- * aggregate non-partial paths, then don't bother creating the new
+ * aggregate non-partial paths, and no partially aggregated paths were
+ * generated by eager aggregation, then don't bother creating the new
* RelOptInfo at all, unless the caller specified force_rel_creation.
*/
if (cheapest_total_path == NULL &&
cheapest_partial_path == NULL &&
+ eager_agg_rel == NULL &&
!force_rel_creation)
return NULL;
@@ -7557,6 +7583,51 @@ create_partial_grouping_paths(PlannerInfo *root,
dNumPartialPartialGroups));
}
+ /*
+ * Add any partially aggregated paths generated by eager aggregation to
+ * the new upper relation after applying projection steps as needed.
+ */
+ if (eager_agg_rel)
+ {
+ /* Add the paths */
+ foreach(lc, eager_agg_rel->pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_path(partially_grouped_rel, path);
+ }
+
+ /*
+ * Likewise add the partial paths, but only if parallelism is possible
+ * for partially_grouped_rel.
+ */
+ if (partially_grouped_rel->consider_parallel)
+ {
+ foreach(lc, eager_agg_rel->partial_pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+
+ /* Shouldn't have any parameterized paths anymore */
+ Assert(path->param_info == NULL);
+
+ path = (Path *) create_projection_path(root,
+ partially_grouped_rel,
+ path,
+ partially_grouped_rel->reltarget);
+
+ add_partial_path(partially_grouped_rel, path);
+ }
+ }
+ }
+
/*
* If there is an FDW that's responsible for all baserels of the query,
* let it consider adding partially grouped ForeignPaths.
@@ -8120,13 +8191,6 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, partially_grouped_rel,
partially_grouped_live_children);
-
- /*
- * We need call set_cheapest, since the finalization step will use the
- * cheapest path from the rel.
- */
- if (partially_grouped_rel->pathlist)
- set_cheapest(partially_grouped_rel);
}
/* If possible, create append paths for fully grouped children. */
diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c
index 5b3dc0d8653..69b8b0c2ae0 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -516,6 +516,57 @@ adjust_appendrel_attrs_mutator(Node *node,
return (Node *) newinfo;
}
+ /*
+ * We have to process RelAggInfo nodes specially.
+ */
+ if (IsA(node, RelAggInfo))
+ {
+ RelAggInfo *oldinfo = (RelAggInfo *) node;
+ RelAggInfo *newinfo = makeNode(RelAggInfo);
+
+ newinfo->target = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->target,
+ context);
+
+ newinfo->agg_input = (PathTarget *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input,
+ context);
+
+ newinfo->group_clauses = oldinfo->group_clauses;
+
+ newinfo->group_exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs,
+ context);
+
+ return (Node *) newinfo;
+ }
+
+ /*
+ * We have to process PathTarget nodes specially.
+ */
+ if (IsA(node, PathTarget))
+ {
+ PathTarget *oldtarget = (PathTarget *) node;
+ PathTarget *newtarget = makeNode(PathTarget);
+
+ /* Copy all flat-copiable fields */
+ memcpy(newtarget, oldtarget, sizeof(PathTarget));
+
+ newtarget->exprs = (List *)
+ adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs,
+ context);
+
+ if (oldtarget->sortgrouprefs)
+ {
+ Size nbytes = list_length(oldtarget->exprs) * sizeof(Index);
+
+ newtarget->sortgrouprefs = (Index *) palloc(nbytes);
+ memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes);
+ }
+
+ return (Node *) newtarget;
+ }
+
/*
* NOTE: we do not need to recurse into sublinks, because they should
* already have been converted to subplans before we see them.
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 0e523d2eb5b..cf1bc672137 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -16,6 +16,8 @@
#include <limits.h>
+#include "access/nbtree.h"
+#include "catalog/pg_constraint.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/appendinfo.h"
@@ -27,12 +29,16 @@
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
+#include "optimizer/planner.h"
#include "optimizer/restrictinfo.h"
#include "optimizer/tlist.h"
+#include "parser/parse_oper.h"
#include "parser/parse_relation.h"
#include "rewrite/rewriteManip.h"
#include "utils/hsearch.h"
#include "utils/lsyscache.h"
+#include "utils/selfuncs.h"
+#include "utils/typcache.h"
typedef struct JoinHashEntry
@@ -83,6 +89,14 @@ static void build_child_join_reltarget(PlannerInfo *root,
RelOptInfo *childrel,
int nappinfos,
AppendRelInfo **appinfos);
+static bool eager_aggregation_possible_for_relation(PlannerInfo *root,
+ RelOptInfo *rel);
+static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs);
+static bool is_var_in_aggref_only(PlannerInfo *root, Var *var);
+static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel);
+static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr);
/*
@@ -278,6 +292,8 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
rel->joininfo = NIL;
rel->has_eclass_joins = false;
rel->consider_partitionwise_join = false; /* might get changed later */
+ rel->agg_info = NULL;
+ rel->grouped_rel = NULL;
rel->part_scheme = NULL;
rel->nparts = -1;
rel->boundinfo = NULL;
@@ -408,6 +424,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
return rel;
}
+/*
+ * build_simple_grouped_rel
+ * Construct a new RelOptInfo representing a grouped version of the input
+ * simple relation.
+ */
+RelOptInfo *
+build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ /*
+ * We should have available aggregate expressions and grouping
+ * expressions, otherwise we cannot reach here.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /* nothing to do for dummy rel */
+ if (IS_DUMMY_REL(rel))
+ return NULL;
+
+ /*
+ * Prepare the information needed to create grouped paths for this simple
+ * relation.
+ */
+ agg_info = create_rel_agg_info(root, rel, true);
+ if (agg_info == NULL)
+ return NULL;
+
+ /*
+ * If grouped paths for the given simple relation are not considered
+ * useful, skip building the grouped relation.
+ */
+ if (!agg_info->agg_useful)
+ return NULL;
+
+ /* Track the set of relids at which partial aggregation is applied */
+ agg_info->apply_at = bms_copy(rel->relids);
+
+ /* build the grouped relation */
+ grouped_rel = build_grouped_rel(root, rel);
+ grouped_rel->reltarget = agg_info->target;
+ grouped_rel->rows = agg_info->grouped_rows;
+ grouped_rel->agg_info = agg_info;
+
+ rel->grouped_rel = grouped_rel;
+
+ return grouped_rel;
+}
+
+/*
+ * build_grouped_rel
+ * Build a grouped relation by flat copying the input relation and resetting
+ * the necessary fields.
+ */
+RelOptInfo *
+build_grouped_rel(PlannerInfo *root, RelOptInfo *rel)
+{
+ RelOptInfo *grouped_rel;
+
+ grouped_rel = makeNode(RelOptInfo);
+ memcpy(grouped_rel, rel, sizeof(RelOptInfo));
+
+ /*
+ * clear path info
+ */
+ grouped_rel->pathlist = NIL;
+ grouped_rel->ppilist = NIL;
+ grouped_rel->partial_pathlist = NIL;
+ grouped_rel->cheapest_startup_path = NULL;
+ grouped_rel->cheapest_total_path = NULL;
+ grouped_rel->cheapest_parameterized_paths = NIL;
+
+ /*
+ * clear partition info
+ */
+ grouped_rel->part_scheme = NULL;
+ grouped_rel->nparts = -1;
+ grouped_rel->boundinfo = NULL;
+ grouped_rel->partbounds_merged = false;
+ grouped_rel->partition_qual = NIL;
+ grouped_rel->part_rels = NULL;
+ grouped_rel->live_parts = NULL;
+ grouped_rel->all_partrels = NULL;
+ grouped_rel->partexprs = NULL;
+ grouped_rel->nullable_partexprs = NULL;
+ grouped_rel->consider_partitionwise_join = false;
+
+ /*
+ * clear size estimates
+ */
+ grouped_rel->rows = 0;
+
+ return grouped_rel;
+}
+
/*
* find_base_rel
* Find a base or otherrel relation entry, which must already exist.
@@ -759,6 +872,8 @@ build_join_rel(PlannerInfo *root,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = NULL;
joinrel->top_parent = NULL;
joinrel->top_parent_relids = NULL;
@@ -945,6 +1060,8 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
joinrel->joininfo = NIL;
joinrel->has_eclass_joins = false;
joinrel->consider_partitionwise_join = false; /* might get changed later */
+ joinrel->agg_info = NULL;
+ joinrel->grouped_rel = NULL;
joinrel->parent = parent_joinrel;
joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel;
joinrel->top_parent_relids = joinrel->top_parent->relids;
@@ -2523,3 +2640,536 @@ build_child_join_reltarget(PlannerInfo *root,
childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple;
childrel->reltarget->width = parentrel->reltarget->width;
}
+
+/*
+ * create_rel_agg_info
+ * Create the RelAggInfo structure for the given relation if it can produce
+ * grouped paths. The given relation is the non-grouped one which has the
+ * reltarget already constructed.
+ *
+ * calculate_grouped_rows: if true, calculate the estimated number of grouped
+ * rows for the relation. If false, skip the estimation to avoid unnecessary
+ * planning overhead.
+ */
+RelAggInfo *
+create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel,
+ bool calculate_grouped_rows)
+{
+ ListCell *lc;
+ RelAggInfo *result;
+ PathTarget *agg_input;
+ PathTarget *target;
+ List *group_clauses = NIL;
+ List *group_exprs = NIL;
+
+ /*
+ * The lists of aggregate expressions and grouping expressions should have
+ * been constructed.
+ */
+ Assert(root->agg_clause_list != NIL);
+ Assert(root->group_expr_list != NIL);
+
+ /*
+ * If this is a child rel, the grouped rel for its parent rel must have
+ * been created if it can. So we can just use parent's RelAggInfo if
+ * there is one, with appropriate variable substitutions.
+ */
+ if (IS_OTHER_REL(rel))
+ {
+ RelOptInfo *grouped_rel;
+ RelAggInfo *agg_info;
+
+ grouped_rel = rel->top_parent->grouped_rel;
+ if (grouped_rel == NULL)
+ return NULL;
+
+ Assert(IS_GROUPED_REL(grouped_rel));
+
+ /* Must do multi-level transformation */
+ agg_info = (RelAggInfo *)
+ adjust_appendrel_attrs_multilevel(root,
+ (Node *) grouped_rel->agg_info,
+ rel,
+ rel->top_parent);
+
+ agg_info->apply_at = NULL; /* caller will change this later */
+
+ if (calculate_grouped_rows)
+ {
+ agg_info->grouped_rows =
+ estimate_num_groups(root, agg_info->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful
+ * iff the average group size is no less than
+ * min_eager_agg_group_size.
+ */
+ agg_info->agg_useful =
+ (rel->rows / agg_info->grouped_rows) >= min_eager_agg_group_size;
+ }
+
+ return agg_info;
+ }
+
+ /* Check if it's possible to produce grouped paths for this relation. */
+ if (!eager_aggregation_possible_for_relation(root, rel))
+ return NULL;
+
+ /*
+ * Create targets for the grouped paths and for the input paths of the
+ * grouped paths.
+ */
+ target = create_empty_pathtarget();
+ agg_input = create_empty_pathtarget();
+
+ /* ... and initialize these targets */
+ if (!init_grouping_targets(root, rel, target, agg_input,
+ &group_clauses, &group_exprs))
+ return NULL;
+
+ /*
+ * Eager aggregation is not applicable if there are no available grouping
+ * expressions.
+ */
+ if (group_clauses == NIL)
+ return NULL;
+
+ /* Add aggregates to the grouping target */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ Aggref *aggref;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ aggref = (Aggref *) copyObject(ac_info->aggref);
+ mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL);
+
+ add_column_to_pathtarget(target, (Expr *) aggref, 0);
+ }
+
+ /* Set the estimated eval cost and output width for both targets */
+ set_pathtarget_cost_width(root, target);
+ set_pathtarget_cost_width(root, agg_input);
+
+ /* build the RelAggInfo result */
+ result = makeNode(RelAggInfo);
+ result->target = target;
+ result->agg_input = agg_input;
+ result->group_clauses = group_clauses;
+ result->group_exprs = group_exprs;
+ result->apply_at = NULL; /* caller will change this later */
+
+ if (calculate_grouped_rows)
+ {
+ result->grouped_rows = estimate_num_groups(root, result->group_exprs,
+ rel->rows, NULL, NULL);
+
+ /*
+ * The grouped paths for the given relation are considered useful iff
+ * the average group size is no less than min_eager_agg_group_size.
+ */
+ result->agg_useful =
+ (rel->rows / result->grouped_rows) >= min_eager_agg_group_size;
+ }
+
+ return result;
+}
+
+/*
+ * eager_aggregation_possible_for_relation
+ * Check if it's possible to produce grouped paths for the given relation.
+ */
+static bool
+eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
+{
+ ListCell *lc;
+ int cur_relid;
+
+ /*
+ * Check to see if the given relation is in the nullable side of an outer
+ * join. In this case, we cannot push a partial aggregation down to the
+ * relation, because the NULL-extended rows produced by the outer join
+ * would not be available when we perform the partial aggregation, while
+ * with a non-eager-aggregation plan these rows are available for the
+ * top-level aggregation. Doing so may result in the rows being grouped
+ * differently than expected, or produce incorrect values from the
+ * aggregate functions.
+ */
+ cur_relid = -1;
+ while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0)
+ {
+ RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid);
+
+ if (baserel == NULL)
+ continue; /* ignore outer joins in rel->relids */
+
+ if (!bms_is_subset(baserel->nulling_relids, rel->relids))
+ return false;
+ }
+
+ /*
+ * For now we don't try to support PlaceHolderVars.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = lfirst(lc);
+
+ if (IsA(expr, PlaceHolderVar))
+ return false;
+ }
+
+ /* Caller should only pass base relations or joins. */
+ Assert(rel->reloptkind == RELOPT_BASEREL ||
+ rel->reloptkind == RELOPT_JOINREL);
+
+ /*
+ * Check if all aggregate expressions can be evaluated on this relation
+ * level.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ /*
+ * Give up if any aggregate requires relations other than the current
+ * one. If the aggregate requires the current relation plus
+ * additional relations, grouping the current relation could make some
+ * input rows unavailable for the higher aggregate and may reduce the
+ * number of input rows it receives. If the aggregate does not
+ * require the current relation at all, it should not be grouped, as
+ * we do not support joining two grouped relations.
+ */
+ if (!bms_is_subset(ac_info->agg_eval_at, rel->relids))
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * init_grouping_targets
+ * Initialize the target for grouped paths (target) as well as the target
+ * for paths that generate input for the grouped paths (agg_input).
+ *
+ * We also construct the list of SortGroupClauses and the list of grouping
+ * expressions for the partial aggregation, and return them in *group_clause
+ * and *group_exprs.
+ *
+ * Return true if the targets could be initialized, false otherwise.
+ */
+static bool
+init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
+ PathTarget *target, PathTarget *agg_input,
+ List **group_clauses, List **group_exprs)
+{
+ ListCell *lc;
+ List *possibly_dependent = NIL;
+ Index maxSortGroupRef;
+
+ /* Identify the max sortgroupref */
+ maxSortGroupRef = 0;
+ foreach(lc, root->processed_tlist)
+ {
+ Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref;
+
+ if (ref > maxSortGroupRef)
+ maxSortGroupRef = ref;
+ }
+
+ /*
+ * At this point, all Vars from this relation that are needed by upper
+ * joins or are required in the final targetlist should already be present
+ * in its reltarget. Therefore, we can safely iterate over this
+ * relation's reltarget->exprs to construct the PathTarget and grouping
+ * clauses for the grouped paths.
+ */
+ foreach(lc, rel->reltarget->exprs)
+ {
+ Expr *expr = (Expr *) lfirst(lc);
+ Index sortgroupref;
+
+ /*
+ * Given that PlaceHolderVar currently prevents us from doing eager
+ * aggregation, the source target cannot contain anything more complex
+ * than a Var.
+ */
+ Assert(IsA(expr, Var));
+
+ /*
+ * Get the sortgroupref of the expr if it is found among, or can be
+ * deduced from, the original grouping expressions.
+ */
+ sortgroupref = get_expression_sortgroupref(root, expr);
+ if (sortgroupref > 0)
+ {
+ SortGroupClause *sgc;
+
+ /* Find the matching SortGroupClause */
+ sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause);
+ Assert(sgc->tleSortGroupRef <= maxSortGroupRef);
+
+ /*
+ * If the target expression is to be used as a grouping key, it
+ * should be emitted by the grouped paths that have been pushed
+ * down to this relation level.
+ */
+ add_column_to_pathtarget(target, expr, sortgroupref);
+
+ /*
+ * ... and it also should be emitted by the input paths.
+ */
+ add_column_to_pathtarget(agg_input, expr, sortgroupref);
+
+ /*
+ * Record this SortGroupClause and grouping expression. Note that
+ * this SortGroupClause might have already been recorded.
+ */
+ if (!list_member(*group_clauses, sgc))
+ {
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ }
+ else if (is_var_needed_by_join(root, (Var *) expr, rel))
+ {
+ /*
+ * The expression is needed for an upper join but is neither in
+ * the GROUP BY clause nor derivable from it using EC (otherwise,
+ * it would have already been included in the targets above). We
+ * need to create a special SortGroupClause for this expression.
+ *
+ * It is important to include such expressions in the grouping
+ * keys. This is essential to ensure that an aggregated row from
+ * the partial aggregation matches the other side of the join if
+ * and only if each row in the partial group does. This ensures
+ * that all rows within the same partial group share the same
+ * 'destiny', which is crucial for maintaining correctness.
+ */
+ SortGroupClause *sgc;
+ TypeCacheEntry *tce;
+ Oid equalimageproc;
+
+ /*
+ * But first, check if equality implies image equality for this
+ * expression. If not, we cannot use it as a grouping key. See
+ * comments in create_grouping_expr_infos().
+ */
+ tce = lookup_type_cache(exprType((Node *) expr),
+ TYPECACHE_BTREE_OPFAMILY);
+ if (!OidIsValid(tce->btree_opf) ||
+ !OidIsValid(tce->btree_opintype))
+ return false;
+
+ equalimageproc = get_opfamily_proc(tce->btree_opf,
+ tce->btree_opintype,
+ tce->btree_opintype,
+ BTEQUALIMAGE_PROC);
+ if (!OidIsValid(equalimageproc) ||
+ !DatumGetBool(OidFunctionCall1Coll(equalimageproc,
+ tce->typcollation,
+ ObjectIdGetDatum(tce->btree_opintype))))
+ return false;
+
+ /* Create the SortGroupClause. */
+ sgc = makeNode(SortGroupClause);
+
+ /* Initialize the SortGroupClause. */
+ sgc->tleSortGroupRef = ++maxSortGroupRef;
+ get_sort_group_operators(exprType((Node *) expr),
+ false, true, false,
+ &sgc->sortop, &sgc->eqop, NULL,
+ &sgc->hashable);
+
+ /* This expression should be emitted by the grouped paths */
+ add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef);
+
+ /* ... and it also should be emitted by the input paths. */
+ add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef);
+
+ /* Record this SortGroupClause and grouping expression */
+ *group_clauses = lappend(*group_clauses, sgc);
+ *group_exprs = lappend(*group_exprs, expr);
+ }
+ else if (is_var_in_aggref_only(root, (Var *) expr))
+ {
+ /*
+ * The expression is referenced by an aggregate function pushed
+ * down to this relation and does not appear elsewhere in the
+ * targetlist or havingQual. Add it to 'agg_input' but not to
+ * 'target'.
+ */
+ add_new_column_to_pathtarget(agg_input, expr);
+ }
+ else
+ {
+ /*
+ * The expression may be functionally dependent on other
+ * expressions in the target, but we cannot verify this until all
+ * target expressions have been constructed.
+ */
+ possibly_dependent = lappend(possibly_dependent, expr);
+ }
+ }
+
+ /*
+ * Now we can verify whether an expression is functionally dependent on
+ * others.
+ */
+ foreach(lc, possibly_dependent)
+ {
+ Var *tvar;
+ List *deps = NIL;
+ RangeTblEntry *rte;
+
+ tvar = lfirst_node(Var, lc);
+ rte = root->simple_rte_array[tvar->varno];
+
+ if (check_functional_grouping(rte->relid, tvar->varno,
+ tvar->varlevelsup,
+ target->exprs, &deps))
+ {
+ /*
+ * The expression is functionally dependent on other target
+ * expressions, so it can be included in the targets. Since it
+ * will not be used as a grouping key, a sortgroupref is not
+ * needed for it.
+ */
+ add_new_column_to_pathtarget(target, (Expr *) tvar);
+ add_new_column_to_pathtarget(agg_input, (Expr *) tvar);
+ }
+ else
+ {
+ /*
+ * We may arrive here with a grouping expression that is proven
+ * redundant by EquivalenceClass processing, such as 't1.a' in the
+ * query below.
+ *
+ * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a,
+ * t1.b;
+ *
+ * For now we just give up in this case.
+ */
+ return false;
+ }
+ }
+
+ return true;
+}
+
+/*
+ * is_var_in_aggref_only
+ * Check whether the given Var appears in aggregate expressions and not
+ * elsewhere in the targetlist or havingQual.
+ */
+static bool
+is_var_in_aggref_only(PlannerInfo *root, Var *var)
+{
+ ListCell *lc;
+
+ /*
+ * Search the list of aggregate expressions for the Var.
+ */
+ foreach(lc, root->agg_clause_list)
+ {
+ AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc);
+ List *vars;
+
+ Assert(IsA(ac_info->aggref, Aggref));
+
+ if (!bms_is_member(var->varno, ac_info->agg_eval_at))
+ continue;
+
+ vars = pull_var_clause((Node *) ac_info->aggref,
+ PVC_RECURSE_AGGREGATES |
+ PVC_RECURSE_WINDOWFUNCS |
+ PVC_RECURSE_PLACEHOLDERS);
+
+ if (list_member(vars, var))
+ {
+ list_free(vars);
+ break;
+ }
+
+ list_free(vars);
+ }
+
+ return (lc != NULL && !list_member(root->tlist_vars, var));
+}
+
+/*
+ * is_var_needed_by_join
+ * Check if the given Var is needed by joins above the current rel.
+ */
+static bool
+is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel)
+{
+ Relids relids;
+ int attno;
+ RelOptInfo *baserel;
+
+ /*
+ * Note that when checking if the Var is needed by joins above, we want to
+ * exclude cases where the Var is only needed in the final targetlist. So
+ * include "relation 0" in the check.
+ */
+ relids = bms_copy(rel->relids);
+ relids = bms_add_member(relids, 0);
+
+ baserel = find_base_rel(root, var->varno);
+ attno = var->varattno - baserel->min_attr;
+
+ return bms_nonempty_difference(baserel->attr_needed[attno], relids);
+}
+
+/*
+ * get_expression_sortgroupref
+ * Return the sortgroupref of the given "expr" if it is found among the
+ * original grouping expressions, or is known equal to any of the original
+ * grouping expressions due to equivalence relationships. Return 0 if no
+ * match is found.
+ */
+static Index
+get_expression_sortgroupref(PlannerInfo *root, Expr *expr)
+{
+ ListCell *lc;
+
+ Assert(IsA(expr, Var));
+
+ foreach(lc, root->group_expr_list)
+ {
+ GroupingExprInfo *ge_info = lfirst_node(GroupingExprInfo, lc);
+ ListCell *lc1;
+
+ Assert(IsA(ge_info->expr, Var));
+ Assert(ge_info->sortgroupref > 0);
+
+ if (equal(expr, ge_info->expr))
+ return ge_info->sortgroupref;
+
+ if (ge_info->ec == NULL ||
+ !bms_is_member(((Var *) expr)->varno, ge_info->ec->ec_relids))
+ continue;
+
+ /*
+ * Scan the EquivalenceClass, looking for a match to the given
+ * expression. We ignore child members here.
+ */
+ foreach(lc1, ge_info->ec->ec_members)
+ {
+ EquivalenceMember *em = (EquivalenceMember *) lfirst(lc1);
+
+ /* Child members should not exist in ec_members */
+ Assert(!em->em_is_child);
+
+ if (equal(expr, em->em_expr))
+ return ge_info->sortgroupref;
+ }
+ }
+
+ /* no match is found */
+ return 0;
+}
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 6bc6be13d2a..b176d5130e4 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -145,6 +145,13 @@
boot_val => 'false',
},
+{ name => 'enable_eager_aggregate', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+ short_desc => 'Enables eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'enable_eager_aggregate',
+ boot_val => 'true',
+},
+
{ name => 'enable_parallel_append', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
short_desc => 'Enables the planner\'s use of parallel append plans.',
flags => 'GUC_EXPLAIN',
@@ -2427,6 +2434,15 @@
max => 'DBL_MAX',
},
+{ name => 'min_eager_agg_group_size', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_COST',
+ short_desc => 'Sets the minimum average group size required to consider applying eager aggregation.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'min_eager_agg_group_size',
+ boot_val => '8.0',
+ min => '0.0',
+ max => 'DBL_MAX',
+},
+
{ name => 'cursor_tuple_fraction', type => 'real', context => 'PGC_USERSET', group => 'QUERY_TUNING_OTHER',
short_desc => 'Sets the planner\'s estimate of the fraction of a cursor\'s rows that will be retrieved.',
flags => 'GUC_EXPLAIN',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c36fcb9ab61..c5d612ab552 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -428,6 +428,7 @@
#enable_group_by_reordering = on
#enable_distinct_reordering = on
#enable_self_join_elimination = on
+#enable_eager_aggregate = on
# - Planner Cost Constants -
@@ -441,6 +442,7 @@
#min_parallel_table_scan_size = 8MB
#min_parallel_index_scan_size = 512kB
#effective_cache_size = 4GB
+#min_eager_agg_group_size = 8.0
#jit_above_cost = 100000 # perform JIT compilation if available
# and query more expensive than this;
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index b12a2508d8c..798b431c5aa 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -391,6 +391,15 @@ struct PlannerInfo
/* list of PlaceHolderInfos */
List *placeholder_list;
+ /* list of AggClauseInfos */
+ List *agg_clause_list;
+
+ /* list of GroupExprInfos */
+ List *group_expr_list;
+
+ /* list of plain Vars contained in targetlist and havingQual */
+ List *tlist_vars;
+
/* array of PlaceHolderInfos indexed by phid */
struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size));
/* allocated size of array */
@@ -1040,6 +1049,14 @@ typedef struct RelOptInfo
/* consider partitionwise join paths? (if partitioned rel) */
bool consider_partitionwise_join;
+ /*
+ * used by eager aggregation:
+ */
+ /* information needed to create grouped paths */
+ struct RelAggInfo *agg_info;
+ /* the partially-aggregated version of the relation */
+ struct RelOptInfo *grouped_rel;
+
/*
* inheritance links, if this is an otherrel (otherwise NULL):
*/
@@ -1124,6 +1141,63 @@ typedef struct RelOptInfo
((nominal_jointype) == JOIN_INNER && (sjinfo)->jointype == JOIN_SEMI && \
bms_equal((sjinfo)->syn_righthand, (rel)->relids))
+/*
+ * Is the given relation a grouped relation?
+ */
+#define IS_GROUPED_REL(rel) \
+ ((rel)->agg_info != NULL)
+
+/*
+ * RelAggInfo
+ * Information needed to create paths for a grouped relation.
+ *
+ * "target" is the default result targetlist for Paths scanning this grouped
+ * relation; list of Vars/Exprs, cost, width.
+ *
+ * "agg_input" is the output tlist for the paths that provide input to the
+ * grouped paths. One difference from the reltarget of the non-grouped
+ * relation is that agg_input has its sortgrouprefs[] initialized.
+ *
+ * "group_clauses" and "group_exprs" are lists of SortGroupClauses and the
+ * corresponding grouping expressions.
+ *
+ * "apply_at" tracks the set of relids at which partial aggregation is applied
+ * in the paths of this grouped relation.
+ *
+ * "grouped_rows" is the estimated number of result tuples of the grouped
+ * relation.
+ *
+ * "agg_useful" is a flag to indicate whether the grouped paths are considered
+ * useful. It is set true if the average partial group size is no less than
+ * min_eager_agg_group_size, suggesting a significant row count reduction.
+ */
+typedef struct RelAggInfo
+{
+ pg_node_attr(no_copy_equal, no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the output tlist for the grouped paths */
+ struct PathTarget *target;
+
+ /* the output tlist for the input paths */
+ struct PathTarget *agg_input;
+
+ /* a list of SortGroupClauses */
+ List *group_clauses;
+ /* a list of grouping expressions */
+ List *group_exprs;
+
+ /* the set of relids partial aggregation is applied at */
+ Relids apply_at;
+
+ /* estimated number of result tuples */
+ Cardinality grouped_rows;
+
+ /* the grouped paths are considered useful? */
+ bool agg_useful;
+} RelAggInfo;
+
/*
* IndexOptInfo
* Per-index information for planning/optimization
@@ -3268,6 +3342,49 @@ typedef struct MinMaxAggInfo
Param *param;
} MinMaxAggInfo;
+/*
+ * For each distinct Aggref node that appears in the targetlist and HAVING
+ * clauses, we store an AggClauseInfo node in the PlannerInfo node's
+ * agg_clause_list. Each AggClauseInfo records the set of relations referenced
+ * by the aggregate expression. This information is used to determine how far
+ * the aggregate can be safely pushed down in the join tree.
+ */
+typedef struct AggClauseInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the Aggref expr */
+ Aggref *aggref;
+
+ /* lowest level we can evaluate this aggregate at */
+ Relids agg_eval_at;
+} AggClauseInfo;
+
+/*
+ * For each grouping expression that appears in grouping clauses, we store a
+ * GroupingExprInfo node in the PlannerInfo node's group_expr_list. Each
+ * GroupingExprInfo records the expression being grouped on, its sortgroupref,
+ * and the EquivalenceClass it belongs to. This information is necessary to
+ * reproduce correct grouping semantics at different levels of the join tree.
+ */
+typedef struct GroupingExprInfo
+{
+ pg_node_attr(no_read, no_query_jumble)
+
+ NodeTag type;
+
+ /* the represented expression */
+ Expr *expr;
+
+ /* the tleSortGroupRef of the corresponding SortGroupClause */
+ Index sortgroupref;
+
+ /* the equivalence class the expression belongs to */
+ EquivalenceClass *ec pg_node_attr(copy_as_scalar, equal_as_scalar);
+} GroupingExprInfo;
+
/*
* At runtime, PARAM_EXEC slots are used to pass values around from one plan
* node to another. They can be used to pass values down into subqueries (for
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 763cd25bb3c..da60383c2aa 100644
--- a/src/include/optimizer/pathnode.h
+++ b/src/include/optimizer/pathnode.h
@@ -312,6 +312,8 @@ extern void setup_simple_rel_arrays(PlannerInfo *root);
extern void expand_planner_arrays(PlannerInfo *root, int add_size);
extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid,
RelOptInfo *parent);
+extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel);
+extern RelOptInfo *build_grouped_rel(PlannerInfo *root, RelOptInfo *rel);
extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid);
extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid);
@@ -351,4 +353,6 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root,
SpecialJoinInfo *sjinfo,
int nappinfos, AppendRelInfo **appinfos);
+extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel,
+ bool calculate_grouped_rows);
#endif /* PATHNODE_H */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index cbade77b717..f6a62df0b43 100644
--- a/src/include/optimizer/paths.h
+++ b/src/include/optimizer/paths.h
@@ -21,7 +21,9 @@
* allpaths.c
*/
extern PGDLLIMPORT bool enable_geqo;
+extern PGDLLIMPORT bool enable_eager_aggregate;
extern PGDLLIMPORT int geqo_threshold;
+extern PGDLLIMPORT double min_eager_agg_group_size;
extern PGDLLIMPORT int min_parallel_table_scan_size;
extern PGDLLIMPORT int min_parallel_index_scan_size;
extern PGDLLIMPORT bool enable_group_by_reordering;
@@ -57,6 +59,8 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel,
bool override_rows);
+extern void generate_grouped_paths(PlannerInfo *root, RelOptInfo *grouped_rel,
+ RelOptInfo *rel);
extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages,
double index_pages, int max_workers);
extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel,
diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h
index 9d3debcab28..09b48b26f8f 100644
--- a/src/include/optimizer/planmain.h
+++ b/src/include/optimizer/planmain.h
@@ -76,6 +76,7 @@ extern void add_vars_to_targetlist(PlannerInfo *root, List *vars,
extern void add_vars_to_attr_needed(PlannerInfo *root, List *vars,
Relids where_needed);
extern void remove_useless_groupby_columns(PlannerInfo *root);
+extern void setup_eager_aggregation(PlannerInfo *root);
extern void find_lateral_references(PlannerInfo *root);
extern void rebuild_lateral_attr_needed(PlannerInfo *root);
extern void create_lateral_join_info(PlannerInfo *root);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 69805d4b9ec..ef79d6f1ded 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2437,11 +2437,11 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SET enable_partitionwise_join TO false;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2449,10 +2449,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2464,11 +2466,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> HashAggregate
+ -> Finalize HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2476,10 +2478,12 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(13 rows)
+ -> Partial HashAggregate
+ Group Key: t2.c
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(15 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
new file mode 100644
index 00000000000..fc0f8c14ec9
--- /dev/null
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -0,0 +1,1714 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+--
+-- Test eager aggregation over base rel
+--
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b
+ Sort Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b
+(21 rows)
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test eager aggregation over join rel
+--
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(25 rows)
+
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg((t2.c + t3.c))
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg((t2.c + t3.c)))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg((t2.c + t3.c)))
+ -> Partial GroupAggregate
+ Output: t2.b, PARTIAL avg((t2.c + t3.c))
+ Group Key: t2.b
+ -> Sort
+ Output: t2.c, t2.b, t3.c
+ Sort Key: t2.b
+ -> Hash Join
+ Output: t2.c, t2.b, t3.c
+ Hash Cond: (t3.a = t2.a)
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a, t3.b, t3.c
+ -> Hash
+ Output: t2.c, t2.b, t2.a
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.c, t2.b, t2.a
+(28 rows)
+
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 497
+ 2 | 499
+ 3 | 501
+ 4 | 503
+ 5 | 505
+ 6 | 507
+ 7 | 509
+ 8 | 511
+ 9 | 513
+(9 rows)
+
+RESET enable_hashagg;
+--
+-- Test that eager aggregation works for outer join
+--
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Right Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ | 505
+(10 rows)
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c)
+ FROM eager_agg_t1 t1
+ LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (avg(t2.c))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, avg(t2.c)
+ Group Key: t2.b
+ -> Hash Right Join
+ Output: t2.b, t2.c
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t1.b
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.b
+(15 rows)
+
+SELECT t2.b, avg(t2.c)
+ FROM eager_agg_t1 t1
+ LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t2.b ORDER BY t2.b;
+ b | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+ |
+(10 rows)
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+---------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Gather Merge
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Workers Planned: 2
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Parallel Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Parallel Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Parallel Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Parallel Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(21 rows)
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+--
+-- Test eager aggregation with GEQO
+--
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t1.a, avg(t2.c)
+ Group Key: t1.a
+ -> Sort
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Sort Key: t1.a
+ -> Hash Join
+ Output: t1.a, (PARTIAL avg(t2.c))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL avg(t2.c))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL avg(t2.c)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ a | avg
+---+-----
+ 1 | 496
+ 2 | 497
+ 3 | 498
+ 4 | 499
+ 5 | 500
+ 6 | 501
+ 7 | 502
+ 8 | 503
+ 9 | 504
+(9 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+--
+-- Test eager aggregation for partitionwise join
+--
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+-- When GROUP BY clause matches; full aggregation is performed for each
+-- partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.y ORDER BY t2.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t2.y, (sum(t1.y)), (count(*))
+ Sort Key: t2.y
+ -> Append
+ -> Finalize HashAggregate
+ Output: t2.y, sum(t1.y), count(*)
+ Group Key: t2.y
+ -> Hash Join
+ Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.y, t1.x
+ -> Finalize HashAggregate
+ Output: t2_1.y, sum(t1_1.y), count(*)
+ Group Key: t2_1.y
+ -> Hash Join
+ Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Finalize HashAggregate
+ Output: t2_2.y, sum(t1_2.y), count(*)
+ Group Key: t2_2.y
+ -> Hash Join
+ Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.y, t1_2.x
+(49 rows)
+
+SELECT t2.y, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.y ORDER BY t2.y;
+ y | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for
+-- each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t2.x, (sum(t1.x)), (count(*))
+ Sort Key: t2.x
+ -> Finalize HashAggregate
+ Output: t2.x, sum(t1.x), count(*)
+ Group Key: t2.x
+ Filter: (avg(t1.x) > '5'::numeric)
+ -> Append
+ -> Hash Join
+ Output: t2.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.x, t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.x)), (PARTIAL count(*)), (PARTIAL avg(t1.x))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.x), PARTIAL count(*), PARTIAL avg(t1.x)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.x, t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.x, t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(44 rows)
+
+SELECT t2.x, sum(t1.x), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+ x | sum | count
+---+-------+-------
+ 0 | 33835 | 6667
+ 1 | 39502 | 6667
+ 2 | 46169 | 6667
+ 3 | 52836 | 6667
+ 4 | 59503 | 6667
+ 5 | 33500 | 6667
+ 6 | 39837 | 6667
+ 7 | 46504 | 6667
+ 8 | 53171 | 6667
+ 9 | 59838 | 6667
+(10 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y)))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y))
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y)))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y))
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y)))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y))
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y)))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+(70 rows)
+
+SELECT t1.x, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+ x | sum
+----+---------
+ 0 | 1437480
+ 1 | 2082896
+ 2 | 2684422
+ 3 | 3285948
+ 4 | 3887474
+ 5 | 1526260
+ 6 | 2127786
+ 7 | 2729312
+ 8 | 3330838
+ 9 | 3932364
+ 10 | 1481370
+ 11 | 2012472
+ 12 | 2587464
+ 13 | 3162456
+ 14 | 3737448
+(15 rows)
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t3.y, sum((t2.y + t3.y))
+ Group Key: t3.y
+ -> Sort
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Sort Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y)))
+ Hash Cond: (t2.x = t1.x)
+ -> Partial GroupAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y))
+ Group Key: t2.x, t3.y, t3.x
+ -> Incremental Sort
+ Output: t2.y, t2.x, t3.y, t3.x
+ Sort Key: t2.x, t3.y
+ Presorted Key: t2.x
+ -> Merge Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Merge Cond: (t2.x = t3.x)
+ -> Sort
+ Output: t2.y, t2.x
+ Sort Key: t2.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t2
+ Output: t2.y, t2.x
+ -> Sort
+ Output: t3.y, t3.x
+ Sort Key: t3.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t3
+ Output: t3.y, t3.x
+ -> Hash
+ Output: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y)))
+ Hash Cond: (t2_1.x = t1_1.x)
+ -> Partial GroupAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y))
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Incremental Sort
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Sort Key: t2_1.x, t3_1.y
+ Presorted Key: t2_1.x
+ -> Merge Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Merge Cond: (t2_1.x = t3_1.x)
+ -> Sort
+ Output: t2_1.y, t2_1.x
+ Sort Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Sort
+ Output: t3_1.y, t3_1.x
+ Sort Key: t3_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash
+ Output: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y)))
+ Hash Cond: (t2_2.x = t1_2.x)
+ -> Partial GroupAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y))
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Incremental Sort
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Sort Key: t2_2.x, t3_2.y
+ Presorted Key: t2_2.x
+ -> Merge Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Merge Cond: (t2_2.x = t3_2.x)
+ -> Sort
+ Output: t2_2.y, t2_2.x
+ Sort Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Sort
+ Output: t3_2.y, t3_2.x
+ Sort Key: t3_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash
+ Output: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x
+(88 rows)
+
+SELECT t3.y, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+ y | sum
+---+---------
+ 0 | 1111110
+ 1 | 2000132
+ 2 | 2889154
+ 3 | 3778176
+ 4 | 4667198
+ 5 | 3334000
+ 6 | 4223022
+ 7 | 5112044
+ 8 | 6001066
+ 9 | 6890088
+(10 rows)
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t1.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t1.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ Hash Cond: (t2.y = t1.x)
+ -> Seq Scan on public.eager_agg_tab2_p1 t2
+ Output: t2.y
+ -> Hash
+ Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*)
+ Group Key: t1.x
+ -> Seq Scan on public.eager_agg_tab1_p1 t1
+ Output: t1.x, t1.y
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t1_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ Hash Cond: (t2_1.y = t1_1.x)
+ -> Seq Scan on public.eager_agg_tab2_p2 t2_1
+ Output: t2_1.y
+ -> Hash
+ Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*)
+ Group Key: t1_1.x
+ -> Seq Scan on public.eager_agg_tab1_p2 t1_1
+ Output: t1_1.x, t1_1.y
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t1_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ Hash Cond: (t2_2.y = t1_2.x)
+ -> Seq Scan on public.eager_agg_tab2_p3 t2_2
+ Output: t2_2.y
+ -> Hash
+ Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*)
+ Group Key: t1_2.x
+ -> Seq Scan on public.eager_agg_tab1_p3 t1_2
+ Output: t1_2.x, t1_2.y
+(49 rows)
+
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 10890 | 4356
+ 1 | 15544 | 4489
+ 2 | 20033 | 4489
+ 3 | 24522 | 4489
+ 4 | 29011 | 4489
+ 5 | 11390 | 4489
+ 6 | 15879 | 4489
+ 7 | 20368 | 4489
+ 8 | 24857 | 4489
+ 9 | 29346 | 4489
+ 10 | 11055 | 4489
+ 11 | 15246 | 4356
+ 12 | 19602 | 4356
+ 13 | 23958 | 4356
+ 14 | 28314 | 4356
+(15 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+ANALYZE eager_agg_tab_ml;
+-- When GROUP BY clause matches; full aggregation is performed for each
+-- partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- When GROUP BY clause does not match; partial aggregation is performed for
+-- each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.y ORDER BY t1.y;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.y, (sum(t2.y)), (count(*))
+ Sort Key: t1.y
+ -> Finalize HashAggregate
+ Output: t1.y, sum(t2.y), count(*)
+ Group Key: t1.y
+ -> Append
+ -> Hash Join
+ Output: t1.y, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.y, t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash Join
+ Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.y, t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash Join
+ Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.y, t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash Join
+ Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.y, t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash Join
+ Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.y, t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(67 rows)
+
+SELECT t1.y, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.y ORDER BY t1.y;
+ y | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum((t2.y + t3.y)), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(114 rows)
+
+SELECT t1.x, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------
+ Sort
+ Output: t3.y, (sum((t2.y + t3.y))), (count(*))
+ Sort Key: t3.y
+ -> Finalize HashAggregate
+ Output: t3.y, sum((t2.y + t3.y)), count(*)
+ Group Key: t3.y
+ -> Append
+ -> Hash Join
+ Output: t3.y, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, t3.y, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, t3.y, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*)
+ Group Key: t2.x, t3.y, t3.x
+ -> Hash Join
+ Output: t2.y, t2.x, t3.y, t3.x
+ Hash Cond: (t2.x = t3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Hash
+ Output: t3.y, t3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t3
+ Output: t3.y, t3.x
+ -> Hash Join
+ Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*)
+ Group Key: t2_1.x, t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x
+ Hash Cond: (t2_1.x = t3_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Hash
+ Output: t3_1.y, t3_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1
+ Output: t3_1.y, t3_1.x
+ -> Hash Join
+ Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*)
+ Group Key: t2_2.x, t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x
+ Hash Cond: (t2_2.x = t3_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Hash
+ Output: t3_2.y, t3_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2
+ Output: t3_2.y, t3_2.x
+ -> Hash Join
+ Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*)
+ Group Key: t2_3.x, t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x
+ Hash Cond: (t2_3.x = t3_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Hash
+ Output: t3_3.y, t3_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3
+ Output: t3_3.y, t3_3.x
+ -> Hash Join
+ Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*)
+ Group Key: t2_4.x, t3_4.y, t3_4.x
+ -> Hash Join
+ Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x
+ Hash Cond: (t2_4.x = t3_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+ -> Hash
+ Output: t3_4.y, t3_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4
+ Output: t3_4.y, t3_4.x
+(102 rows)
+
+SELECT t3.y, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+ y | sum | count
+----+---------+-------
+ 0 | 0 | 35937
+ 1 | 78608 | 39304
+ 2 | 157216 | 39304
+ 3 | 235824 | 39304
+ 4 | 314432 | 39304
+ 5 | 393040 | 39304
+ 6 | 471648 | 39304
+ 7 | 550256 | 39304
+ 8 | 628864 | 39304
+ 9 | 707472 | 39304
+ 10 | 786080 | 39304
+ 11 | 790614 | 35937
+ 12 | 862488 | 35937
+ 13 | 934362 | 35937
+ 14 | 1006236 | 35937
+ 15 | 1078110 | 35937
+ 16 | 1149984 | 35937
+ 17 | 1221858 | 35937
+ 18 | 1293732 | 35937
+ 19 | 1365606 | 35937
+ 20 | 1437480 | 35937
+ 21 | 1509354 | 35937
+ 22 | 1581228 | 35937
+ 23 | 1653102 | 35937
+ 24 | 1724976 | 35937
+ 25 | 1796850 | 35937
+ 26 | 1868724 | 35937
+ 27 | 1940598 | 35937
+ 28 | 2012472 | 35937
+ 29 | 2084346 | 35937
+(30 rows)
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+ QUERY PLAN
+---------------------------------------------------------------------------------------
+ Sort
+ Output: t1.x, (sum(t2.y)), (count(*))
+ Sort Key: t1.x
+ -> Append
+ -> Finalize HashAggregate
+ Output: t1.x, sum(t2.y), count(*)
+ Group Key: t1.x
+ -> Hash Join
+ Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ Hash Cond: (t1.x = t2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t1
+ Output: t1.x
+ -> Hash
+ Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*)
+ Group Key: t2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p1 t2
+ Output: t2.y, t2.x
+ -> Finalize HashAggregate
+ Output: t1_1.x, sum(t2_1.y), count(*)
+ Group Key: t1_1.x
+ -> Hash Join
+ Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ Hash Cond: (t1_1.x = t2_1.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1
+ Output: t1_1.x
+ -> Hash
+ Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*)
+ Group Key: t2_1.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1
+ Output: t2_1.y, t2_1.x
+ -> Finalize HashAggregate
+ Output: t1_2.x, sum(t2_2.y), count(*)
+ Group Key: t1_2.x
+ -> Hash Join
+ Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ Hash Cond: (t1_2.x = t2_2.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2
+ Output: t1_2.x
+ -> Hash
+ Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*)
+ Group Key: t2_2.x
+ -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2
+ Output: t2_2.y, t2_2.x
+ -> Finalize HashAggregate
+ Output: t1_3.x, sum(t2_3.y), count(*)
+ Group Key: t1_3.x
+ -> Hash Join
+ Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ Hash Cond: (t1_3.x = t2_3.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3
+ Output: t1_3.x
+ -> Hash
+ Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*)
+ Group Key: t2_3.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3
+ Output: t2_3.y, t2_3.x
+ -> Finalize HashAggregate
+ Output: t1_4.x, sum(t2_4.y), count(*)
+ Group Key: t1_4.x
+ -> Hash Join
+ Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ Hash Cond: (t1_4.x = t2_4.x)
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4
+ Output: t1_4.x
+ -> Hash
+ Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*)
+ Group Key: t2_4.x
+ -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4
+ Output: t2_4.y, t2_4.x
+(79 rows)
+
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+ x | sum | count
+----+-------+-------
+ 0 | 0 | 1089
+ 1 | 1156 | 1156
+ 2 | 2312 | 1156
+ 3 | 3468 | 1156
+ 4 | 4624 | 1156
+ 5 | 5780 | 1156
+ 6 | 6936 | 1156
+ 7 | 8092 | 1156
+ 8 | 9248 | 1156
+ 9 | 10404 | 1156
+ 10 | 11560 | 1156
+ 11 | 11979 | 1089
+ 12 | 13068 | 1089
+ 13 | 14157 | 1089
+ 14 | 15246 | 1089
+ 15 | 16335 | 1089
+ 16 | 17424 | 1089
+ 17 | 18513 | 1089
+ 18 | 19602 | 1089
+ 19 | 20691 | 1089
+ 20 | 21780 | 1089
+ 21 | 22869 | 1089
+ 22 | 23958 | 1089
+ 23 | 25047 | 1089
+ 24 | 26136 | 1089
+ 25 | 27225 | 1089
+ 26 | 28314 | 1089
+ 27 | 29403 | 1089
+ 28 | 30492 | 1089
+ 29 | 31581 | 1089
+(30 rows)
+
+RESET geqo;
+RESET geqo_threshold;
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index cd37f549b5a..bdbf21a874d 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -2840,20 +2840,22 @@ select x.thousand, x.twothousand, count(*)
from tenk1 x inner join tenk1 y on x.thousand = y.thousand
group by x.thousand, x.twothousand
order by x.thousand desc, x.twothousand;
- QUERY PLAN
-----------------------------------------------------------------------------------
- GroupAggregate
+ QUERY PLAN
+----------------------------------------------------------------------------------------
+ Finalize GroupAggregate
Group Key: x.thousand, x.twothousand
-> Incremental Sort
Sort Key: x.thousand DESC, x.twothousand
Presorted Key: x.thousand
-> Merge Join
Merge Cond: (y.thousand = x.thousand)
- -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
+ -> Partial GroupAggregate
+ Group Key: y.thousand
+ -> Index Only Scan Backward using tenk1_thous_tenthous on tenk1 y
-> Sort
Sort Key: x.thousand DESC
-> Seq Scan on tenk1 x
-(11 rows)
+(13 rows)
reset enable_hashagg;
reset enable_nestloop;
diff --git a/src/test/regress/expected/partition_aggregate.out b/src/test/regress/expected/partition_aggregate.out
index cb12bf53719..fc84929a002 100644
--- a/src/test/regress/expected/partition_aggregate.out
+++ b/src/test/regress/expected/partition_aggregate.out
@@ -13,6 +13,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
--
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 83228cfca29..3b37fafa65b 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -151,6 +151,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_async_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
+ enable_eager_aggregate | on
enable_gathermerge | on
enable_group_by_reordering | on
enable_hashagg | on
@@ -172,7 +173,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(24 rows)
+(25 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index fbffc67ae60..f9450cdc477 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -123,7 +123,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr
# The stats test resets stats, so nothing else needing stats access can be in
# this group.
# ----------
-test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa
+test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression compression_lz4 memoize stats predicate numa eager_aggregate
# event_trigger depends on create_am and cannot run concurrently with
# any test that runs DDL
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
new file mode 100644
index 00000000000..e328a83b4c7
--- /dev/null
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -0,0 +1,380 @@
+--
+-- EAGER AGGREGATION
+-- Test we can push aggregation down below join
+--
+
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
+
+CREATE TABLE eager_agg_t1 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t2 (a int, b int, c double precision);
+CREATE TABLE eager_agg_t3 (a int, b int, c double precision);
+
+INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+ANALYZE eager_agg_t3;
+
+
+--
+-- Test eager aggregation over base rel
+--
+
+-- Perform scan of a table, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test eager aggregation over join rel
+--
+
+-- Perform join of tables, aggregate the result, join it to the other table
+-- and finalize the aggregation.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+-- Produce results with sorting aggregation
+SET enable_hashagg TO off;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c + t3.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+ JOIN eager_agg_t3 t3 ON t2.a = t3.a
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET enable_hashagg;
+
+
+--
+-- Test that eager aggregation works for outer join
+--
+
+-- Ensure aggregation can be pushed down to the non-nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+-- Ensure aggregation cannot be pushed down to the nullable side
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, avg(t2.c)
+ FROM eager_agg_t1 t1
+ LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t2.b ORDER BY t2.b;
+
+SELECT t2.b, avg(t2.c)
+ FROM eager_agg_t1 t1
+ LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t2.b ORDER BY t2.b;
+
+
+--
+-- Test that eager aggregation works for parallel plans
+--
+
+SET parallel_setup_cost=0;
+SET parallel_tuple_cost=0;
+SET min_parallel_table_scan_size=0;
+SET max_parallel_workers_per_gather=4;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET parallel_setup_cost;
+RESET parallel_tuple_cost;
+RESET min_parallel_table_scan_size;
+RESET max_parallel_workers_per_gather;
+
+--
+-- Test eager aggregation with GEQO
+--
+
+SET geqo = on;
+SET geqo_threshold = 2;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+SELECT t1.a, avg(t2.c)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+DROP TABLE eager_agg_t3;
+
+
+--
+-- Test eager aggregation for partitionwise join
+--
+
+-- Enable partitionwise aggregate, which by default is disabled.
+SET enable_partitionwise_aggregate TO true;
+-- Enable partitionwise join, which by default is disabled.
+SET enable_partitionwise_join TO true;
+
+CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y);
+CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (5);
+CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (5) TO (10);
+CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (15);
+INSERT INTO eager_agg_tab1 SELECT i % 15, i % 10 FROM generate_series(1, 1000) i;
+INSERT INTO eager_agg_tab2 SELECT i % 10, i % 15 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab1;
+ANALYZE eager_agg_tab2;
+
+-- When GROUP BY clause matches; full aggregation is performed for each
+-- partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+
+-- GROUP BY having other matching key
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.y, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.y ORDER BY t2.y;
+
+SELECT t2.y, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.y ORDER BY t2.y;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for
+-- each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.x, sum(t1.x), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+SELECT t2.x, sum(t1.x), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t2.x HAVING avg(t1.x) > 5 ORDER BY t2.x;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+
+SELECT t1.x, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+SET enable_hashagg TO off;
+SET max_parallel_workers_per_gather TO 0;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+
+SELECT t3.y, sum(t2.y + t3.y)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab1 t2 ON t1.x = t2.x
+ JOIN eager_agg_tab1 t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+
+RESET enable_hashagg;
+RESET max_parallel_workers_per_gather;
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+
+SELECT t1.x, sum(t1.y), count(*)
+ FROM eager_agg_tab1 t1
+ JOIN eager_agg_tab2 t2 ON t1.x = t2.y
+GROUP BY t1.x ORDER BY t1.x;
+
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_tab1;
+DROP TABLE eager_agg_tab2;
+
+
+--
+-- Test with multi-level partitioning scheme
+--
+CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10);
+CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15);
+CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20);
+CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x);
+CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25);
+CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30);
+INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i;
+
+ANALYZE eager_agg_tab_ml;
+
+-- When GROUP BY clause matches; full aggregation is performed for each
+-- partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+
+-- When GROUP BY clause does not match; partial aggregation is performed for
+-- each partition.
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.y, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.y ORDER BY t1.y;
+
+SELECT t1.y, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.y ORDER BY t1.y;
+
+-- Check with eager aggregation over join rel
+-- full aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+
+SELECT t1.x, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t1.x ORDER BY t1.x;
+
+-- partial aggregation
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t3.y, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+
+SELECT t3.y, sum(t2.y + t3.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+ JOIN eager_agg_tab_ml t3 ON t2.x = t3.x
+GROUP BY t3.y ORDER BY t3.y;
+
+-- try that with GEQO too
+SET geqo = on;
+SET geqo_threshold = 2;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+
+SELECT t1.x, sum(t2.y), count(*)
+ FROM eager_agg_tab_ml t1
+ JOIN eager_agg_tab_ml t2 ON t1.x = t2.x
+GROUP BY t1.x ORDER BY t1.x;
+
+RESET geqo;
+RESET geqo_threshold;
+
+DROP TABLE eager_agg_tab_ml;
diff --git a/src/test/regress/sql/partition_aggregate.sql b/src/test/regress/sql/partition_aggregate.sql
index ab070fee244..124cc260461 100644
--- a/src/test/regress/sql/partition_aggregate.sql
+++ b/src/test/regress/sql/partition_aggregate.sql
@@ -14,6 +14,8 @@ SET enable_partitionwise_join TO true;
SET max_parallel_workers_per_gather TO 0;
-- Disable incremental sort, which can influence selected plans due to fuzz factor.
SET enable_incremental_sort TO off;
+-- Disable eager aggregation, which can interfere with the generation of partitionwise aggregation.
+SET enable_eager_aggregate TO off;
--
-- Tests for list partitioned tables.
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 37f26f6c6b7..02b5b041c45 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -42,6 +42,7 @@ AfterTriggersTableData
AfterTriggersTransData
Agg
AggClauseCosts
+AggClauseInfo
AggInfo
AggPath
AggSplit
@@ -1110,6 +1111,7 @@ GroupPathExtraData
GroupResultPath
GroupState
GroupVarInfo
+GroupingExprInfo
GroupingFunc
GroupingSet
GroupingSetData
@@ -2473,6 +2475,7 @@ ReindexObjectType
ReindexParams
ReindexStmt
ReindexType
+RelAggInfo
RelFileLocator
RelFileLocatorBackend
RelFileNumber
--
2.39.5 (Apple Git-154)
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-08 11:14 David Rowley <[email protected]>
parent: Richard Guo <[email protected]>
1 sibling, 1 reply; 55+ messages in thread
From: David Rowley @ 2025-10-08 11:14 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Tue, 7 Oct 2025 at 23:57, Richard Guo <[email protected]> wrote:
>
> On Mon, Oct 6, 2025 at 10:59 PM David Rowley <[email protected]> wrote:
> > 6. Shouldn't this be using lappend()?
> >
> > agg_clause_list = list_append_unique(agg_clause_list, ac_info);
> >
> > I don't understand why ac_info could already be in the list. You've
> > just done: ac_info = makeNode(AggClauseInfo);
>
> A query can specify the same Aggref expressions multiple times in the
> target list. Using lappend here can lead to duplicate partial Aggref
> nodes in the targetlist of a grouped path, which is what I want to
> avoid.
I was getting that mixed up with list_append_unique_ptr().
> > 9. In get_expression_sortgroupref(), a comment claims "We ignore child
> > members here.". I think that's outdated since ec_members no longer has
> > child members.
>
> I think that comment is used to explain why we only scan ec_members
> here. Similar comments can be found in many other places, such as in
> equivclass.c:
>
> /*
> * Found our match. Scan the other EC members and attempt to generate
> * joinclauses. Ignore children here.
> */
> foreach(lc2, cur_ec->ec_members)
> {
I'd say that's also wrong. "Ignore" means not to pay attention to
something that's there. The child members are not there.
> > 11. The way you've written the header comments for typedef struct
> > RelAggInfo seems weird. I've only ever seen extra details in the
> > header comment when the inline comments have been kept to a single
> > line. You're spanning multiple lines, so why have the out of line
> > comments in the header at all?
> I've also updated the comments within RelAggInfo to use one-line
> style.
The style I'd thought of had the comments on the same line as the
field. Something like struct EquivalenceClass.
>I wrapped the long queries in v24.
+-- Enable eager aggregation, which by default is disabled.
+SET enable_eager_aggregate TO on;
The above comment and command mismatch to my understanding from
looking at postgresql.conf.sample and guc_parameters.dat.
David
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-08 14:45 Robert Haas <[email protected]>
parent: Richard Guo <[email protected]>
1 sibling, 1 reply; 55+ messages in thread
From: Robert Haas @ 2025-10-08 14:45 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: David Rowley <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Tue, Oct 7, 2025 at 6:57 AM Richard Guo <[email protected]> wrote:
> > 10. I don't think this comment quite makes sense:
> >
> > * "apply_at" tracks the lowest join level at which partial aggregation is
> > * applied.
> >
> > maybe "minimum set of rels to join before partial aggregation can be applied"?
> I've updated the comment for apply_at to clarify that it refers to the
> relids at which partial aggregation is applied.
>
> I've also updated the comments within RelAggInfo to use one-line
> style.
>
> I retained the name of this field though.
For what it's worth, I also don't like that field name. I'm not sure
what to propose instead, but I don't think apply_at is very clear.
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-09 01:48 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
1 sibling, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-10-09 01:48 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Mon, Oct 6, 2025 at 9:59 AM Richard Guo <[email protected]> wrote:
> On Mon, Sep 29, 2025 at 11:09 AM Richard Guo <[email protected]> wrote:
> > FWIW, I plan to do another self-review of this patch soon, with the
> > goal of assessing whether it's ready to be pushed. If anyone has any
> > concerns about any part of the patch or would like to review it, I
> > would greatly appreciate hearing from you.
> Barring any objections, I'll plan to push v23 in a couple of days.
I've pushed v24 -- thanks for all the reviews! Now bracing for the
upcoming bug reports.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-09 01:49 Richard Guo <[email protected]>
parent: David Rowley <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-10-09 01:49 UTC (permalink / raw)
To: David Rowley <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Wed, Oct 8, 2025 at 8:14 PM David Rowley <[email protected]> wrote:
> +-- Enable eager aggregation, which by default is disabled.
> +SET enable_eager_aggregate TO on;
> The above comment and command mismatch to my understanding from
> looking at postgresql.conf.sample and guc_parameters.dat.
Right. This GUC was disabled by default prior to v17, and this is a
leftover from that. Will push a fix. Thanks for pointing it out!
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-09 01:51 Richard Guo <[email protected]>
parent: Robert Haas <[email protected]>
0 siblings, 0 replies; 55+ messages in thread
From: Richard Guo @ 2025-10-09 01:51 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: David Rowley <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Wed, Oct 8, 2025 at 11:45 PM Robert Haas <[email protected]> wrote:
> On Tue, Oct 7, 2025 at 6:57 AM Richard Guo <[email protected]> wrote:
> > I retained the name of this field though.
> For what it's worth, I also don't like that field name. I'm not sure
> what to propose instead, but I don't think apply_at is very clear.
This field represents the set of relids at which partial aggregation
is applied. So how about naming it partial_agg_designated_relids?
That feels a bit verbose, though. How about partial_agg_relids or,
for brevity, agg_relids instead?
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-09 02:13 Tom Lane <[email protected]>
parent: Richard Guo <[email protected]>
2 siblings, 1 reply; 55+ messages in thread
From: Tom Lane @ 2025-10-09 02:13 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: Robert Haas <[email protected]>; David Rowley <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
Richard Guo <[email protected]> writes:
> On Wed, Oct 8, 2025 at 11:45 PM Robert Haas <[email protected]> wrote:
>> For what it's worth, I also don't like that field name. I'm not sure
>> what to propose instead, but I don't think apply_at is very clear.
> This field represents the set of relids at which partial aggregation
> is applied. So how about naming it partial_agg_designated_relids?
> That feels a bit verbose, though. How about partial_agg_relids or,
> for brevity, agg_relids instead?
I might be missing a subtlety here, but how about
"apply_aggregation_at" or "apply_partial_agg_at"?
I don't think including "relids" in the field name adds anything,
given the field's declared type and comments.
regards, tom lane
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-09 03:10 Richard Guo <[email protected]>
parent: Tom Lane <[email protected]>
0 siblings, 0 replies; 55+ messages in thread
From: Richard Guo @ 2025-10-09 03:10 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Robert Haas <[email protected]>; David Rowley <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Thu, Oct 9, 2025 at 11:13 AM Tom Lane <[email protected]> wrote:
> Richard Guo <[email protected]> writes:
> > On Wed, Oct 8, 2025 at 11:45 PM Robert Haas <[email protected]> wrote:
> >> For what it's worth, I also don't like that field name. I'm not sure
> >> what to propose instead, but I don't think apply_at is very clear.
> > This field represents the set of relids at which partial aggregation
> > is applied. So how about naming it partial_agg_designated_relids?
> > That feels a bit verbose, though. How about partial_agg_relids or,
> > for brevity, agg_relids instead?
> I might be missing a subtlety here, but how about
> "apply_aggregation_at" or "apply_partial_agg_at"?
>
> I don't think including "relids" in the field name adds anything,
> given the field's declared type and comments.
Fair point.
'agg' seems better to me than 'aggregation' when used in a name: it's
shorter, and it's unlikely anyone would interpret it as anything other
than "aggregation".
I kind of wonder whether we need to include 'partial' in the name.
Given the context, it seems very clear that we're referring to
partial aggregation rather than final aggregation.
So I'm weighing between "apply_partial_agg_at" and "apply_agg_at".
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-09 05:09 Antonin Houska <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Antonin Houska @ 2025-10-09 05:09 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
Richard Guo <[email protected]> wrote:
> On Mon, Oct 6, 2025 at 9:59 AM Richard Guo <[email protected]> wrote:
> > On Mon, Sep 29, 2025 at 11:09 AM Richard Guo <[email protected]> wrote:
> > > FWIW, I plan to do another self-review of this patch soon, with the
> > > goal of assessing whether it's ready to be pushed. If anyone has any
> > > concerns about any part of the patch or would like to review it, I
> > > would greatly appreciate hearing from you.
>
> > Barring any objections, I'll plan to push v23 in a couple of days.
>
> I've pushed v24 -- thanks for all the reviews! Now bracing for the
> upcoming bug reports.
Thanks for finishing this! The lack of feedback I encountered earlier made me
so frustrated that I could not find motivation to collaborate with you. I'm
happy now that my effort did not get wasted.
--
Antonin Houska
Web: https://www.cybertec-postgresql.com
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-09 07:01 Richard Guo <[email protected]>
parent: Antonin Houska <[email protected]>
0 siblings, 0 replies; 55+ messages in thread
From: Richard Guo @ 2025-10-09 07:01 UTC (permalink / raw)
To: Antonin Houska <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Thu, Oct 9, 2025 at 2:09 PM Antonin Houska <[email protected]> wrote:
> Richard Guo <[email protected]> wrote:
> > I've pushed v24 -- thanks for all the reviews! Now bracing for the
> > upcoming bug reports.
> Thanks for finishing this! The lack of feedback I encountered earlier made me
> so frustrated that I could not find motivation to collaborate with you. I'm
> happy now that my effort did not get wasted.
Your efforts in the earlier versions were very important for getting
this feature done. Thank you for your work.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2025-10-09 08:07 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2025-10-09 08:07 UTC (permalink / raw)
To: David Rowley <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Thu, Oct 9, 2025 at 10:49 AM Richard Guo <[email protected]> wrote:
> On Wed, Oct 8, 2025 at 8:14 PM David Rowley <[email protected]> wrote:
> > +-- Enable eager aggregation, which by default is disabled.
> > +SET enable_eager_aggregate TO on;
>
> > The above comment and command mismatch to my understanding from
> > looking at postgresql.conf.sample and guc_parameters.dat.
> Right. This GUC was disabled by default prior to v17, and this is a
> leftover from that. Will push a fix. Thanks for pointing it out!
I noticed an unnecessary header include in initsplan.c. Will fix that
as well.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2026-03-30 03:17 Richard Guo <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2026-03-30 03:17 UTC (permalink / raw)
To: David Rowley <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Matheus Alcantara <[email protected]>
On Thu, Oct 9, 2025 at 5:07 PM Richard Guo <[email protected]> wrote:
> I noticed an unnecessary header include in initsplan.c. Will fix that
> as well.
I noticed a couple of issues that can lead to unexpected results.
I've attached two patches to fix them.
1. Eager aggregation was incorrectly checking the data type's default
collation rather than the expression's actual collation. This allowed
columns with non-deterministic collations to be pushed down, resulting
in incorrect grouping. Fixed by 0001.
2. Pushing aggregates containing volatile functions below a join
alters their execution count. Fixed by 0002.
(As briefly discussed on Discord, this non-deterministic collation
issue also exists in our long-existing logic for pushing HAVING down
to WHERE. But let's fix it for the eager aggregation first.)
- Richard
Attachments:
[application/octet-stream] v1-0001-Fix-collation-handling-for-grouping-keys-in-eager.patch (9.3K, 2-v1-0001-Fix-collation-handling-for-grouping-keys-in-eager.patch)
download | inline diff:
From 3e8997d52dae13b571745355e07678f35d878c0b Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Fri, 27 Mar 2026 17:51:17 +0900
Subject: [PATCH v1 1/2] Fix collation handling for grouping keys in eager
aggregation
When determining if it is safe to use an expression as a grouping key
for partial aggregation, eager aggregation relies on the B-tree
equalimage support function to ensure that equality implies image
equality.
Previously, the code incorrectly passed the default collation of the
expression's data type to the equalimage procedure, rather than the
expression's actual collation. As a result, if a column used a
non-deterministic collation but the base type's default collation was
deterministic, eager aggregation would incorrectly assume that the
column was safe for byte-level grouping. This could cause rows to be
prematurely grouped and subsequently discarded by strict join
conditions, resulting in incorrect query results.
This patch fixes the issue by passing the expression's actual
collation to the equalimage procedure.
---
src/backend/optimizer/plan/initsplan.c | 10 ++-
src/backend/optimizer/util/relnode.c | 10 ++-
.../regress/expected/collate.icu.utf8.out | 71 ++++++++++++++-----
src/test/regress/sql/collate.icu.utf8.sql | 31 ++++++++
4 files changed, 102 insertions(+), 20 deletions(-)
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index c20e7e49780..b207b8d913b 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -913,9 +913,17 @@ create_grouping_expr_infos(PlannerInfo *root)
tce->btree_opintype,
tce->btree_opintype,
BTEQUALIMAGE_PROC);
+
+ /*
+ * If there is no BTEQUALIMAGE_PROC, eager aggregation is assumed to
+ * be unsafe. Otherwise, we call the procedure to check. We must be
+ * careful to pass the expression's actual collation, rather than the
+ * data type's default collation, to ensure that non-deterministic
+ * collations are correctly handled.
+ */
if (!OidIsValid(equalimageproc) ||
!DatumGetBool(OidFunctionCall1Coll(equalimageproc,
- tce->typcollation,
+ exprCollation((Node *) tle->expr),
ObjectIdGetDatum(tce->btree_opintype))))
return;
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 91bcda34a37..3fc2c2f71d0 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -3004,9 +3004,17 @@ init_grouping_targets(PlannerInfo *root, RelOptInfo *rel,
tce->btree_opintype,
tce->btree_opintype,
BTEQUALIMAGE_PROC);
+
+ /*
+ * If there is no BTEQUALIMAGE_PROC, eager aggregation is assumed
+ * to be unsafe. Otherwise, we call the procedure to check. We
+ * must be careful to pass the expression's actual collation,
+ * rather than the data type's default collation, to ensure that
+ * non-deterministic collations are correctly handled.
+ */
if (!OidIsValid(equalimageproc) ||
!DatumGetBool(OidFunctionCall1Coll(equalimageproc,
- tce->typcollation,
+ exprCollation((Node *) expr),
ObjectIdGetDatum(tce->btree_opintype))))
return false;
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index d170e7da066..fbcdb7eb58c 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2454,11 +2454,11 @@ SELECT c collate "C", count(c) FROM pagg_tab3 GROUP BY c collate "C" ORDER BY 1;
SET enable_partitionwise_join TO false;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> Finalize HashAggregate
+ -> HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2466,12 +2466,10 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Partial HashAggregate
- Group Key: t2.c
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(15 rows)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(13 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2483,11 +2481,11 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
SET enable_partitionwise_join TO true;
EXPLAIN (COSTS OFF)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
- QUERY PLAN
--------------------------------------------------------------------
+ QUERY PLAN
+-------------------------------------------------------------
Sort
Sort Key: t1.c COLLATE "C"
- -> Finalize HashAggregate
+ -> HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
@@ -2495,12 +2493,10 @@ SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROU
-> Seq Scan on pagg_tab3_p2 t1_1
-> Seq Scan on pagg_tab3_p1 t1_2
-> Hash
- -> Partial HashAggregate
- Group Key: t2.c
- -> Append
- -> Seq Scan on pagg_tab3_p2 t2_1
- -> Seq Scan on pagg_tab3_p1 t2_2
-(15 rows)
+ -> Append
+ -> Seq Scan on pagg_tab3_p2 t2_1
+ -> Seq Scan on pagg_tab3_p1 t2_2
+(13 rows)
SELECT t1.c, count(t2.c) FROM pagg_tab3 t1 JOIN pagg_tab3 t2 ON t1.c = t2.c GROUP BY 1 ORDER BY t1.c COLLATE "C";
c | count
@@ -2691,6 +2687,45 @@ DROP TABLE pagg_tab6;
RESET enable_partitionwise_aggregate;
RESET max_parallel_workers_per_gather;
RESET enable_incremental_sort;
+--
+-- Test for eager aggregation non-deterministic collation bug
+--
+CREATE TABLE eager_agg_t1 (id int, val text COLLATE case_insensitive);
+CREATE TABLE eager_agg_t2 (val text COLLATE case_insensitive);
+INSERT INTO eager_agg_t1 SELECT 1, 'a' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t1 SELECT 1, 'A' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t2 VALUES ('A');
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+-- Ensure that eager aggregation is not used for t1.val due to the
+-- non-deterministic collation.
+EXPLAIN (COSTS OFF)
+SELECT t1.id, count(t1.val)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.val = t2.val COLLATE "C"
+GROUP BY t1.id;
+ QUERY PLAN
+--------------------------------------------------------
+ HashAggregate
+ Group Key: t1.id
+ -> Nested Loop
+ Join Filter: ((t1.val)::text = (t2.val)::text)
+ -> Seq Scan on eager_agg_t2 t2
+ -> Seq Scan on eager_agg_t1 t1
+(6 rows)
+
+-- Ensure it returns 1 row with count = 50
+SELECT t1.id, count(t1.val)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.val = t2.val COLLATE "C"
+GROUP BY t1.id;
+ id | count
+----+-------
+ 1 | 50
+(1 row)
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
-- virtual generated columns
CREATE TABLE t5 (
a int,
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 8f0f973f5fa..0e6b76b11b8 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -990,6 +990,37 @@ RESET enable_partitionwise_aggregate;
RESET max_parallel_workers_per_gather;
RESET enable_incremental_sort;
+--
+-- Test for eager aggregation non-deterministic collation bug
+--
+
+CREATE TABLE eager_agg_t1 (id int, val text COLLATE case_insensitive);
+CREATE TABLE eager_agg_t2 (val text COLLATE case_insensitive);
+
+INSERT INTO eager_agg_t1 SELECT 1, 'a' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t1 SELECT 1, 'A' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t2 VALUES ('A');
+
+ANALYZE eager_agg_t1;
+ANALYZE eager_agg_t2;
+
+-- Ensure that eager aggregation is not used for t1.val due to the
+-- non-deterministic collation.
+EXPLAIN (COSTS OFF)
+SELECT t1.id, count(t1.val)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.val = t2.val COLLATE "C"
+GROUP BY t1.id;
+
+-- Ensure it returns 1 row with count = 50
+SELECT t1.id, count(t1.val)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.val = t2.val COLLATE "C"
+GROUP BY t1.id;
+
+DROP TABLE eager_agg_t1;
+DROP TABLE eager_agg_t2;
+
-- virtual generated columns
CREATE TABLE t5 (
a int,
--
2.39.5 (Apple Git-154)
[application/octet-stream] v1-0002-Fix-volatile-function-evaluation-in-eager-aggrega.patch (3.2K, 3-v1-0002-Fix-volatile-function-evaluation-in-eager-aggrega.patch)
download | inline diff:
From 0476ff98a83317642a16bca9a5b1eef97925dbd8 Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Sat, 28 Mar 2026 16:52:37 +0900
Subject: [PATCH v1 2/2] Fix volatile function evaluation in eager aggregation
Pushing aggregates containing volatile functions below a join can
violate volatility semantics by changing the number of times the
function is executed.
Here we check the Aggref nodes in the targetlist and havingQual for
volatile functions and disable eager aggregation when such functions
are present.
---
src/backend/optimizer/plan/initsplan.c | 11 ++++++++++
src/test/regress/expected/eager_aggregate.out | 20 +++++++++++++++++++
src/test/regress/sql/eager_aggregate.sql | 8 ++++++++
3 files changed, 39 insertions(+)
diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index b207b8d913b..96ee312ebdf 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -810,6 +810,17 @@ create_agg_clause_infos(PlannerInfo *root)
Assert(aggref->aggorder == NIL);
Assert(aggref->aggdistinct == NIL);
+ /*
+ * We cannot push down aggregates that contain volatile functions.
+ * Doing so would change the number of times the function is
+ * evaluated.
+ */
+ if (contain_volatile_functions((Node *) aggref))
+ {
+ eager_agg_applicable = false;
+ break;
+ }
+
/*
* If there are any securityQuals, do not try to apply eager
* aggregation if any non-leakproof aggregate functions are present.
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
index 5ac966186f7..d1b86be3a62 100644
--- a/src/test/regress/expected/eager_aggregate.out
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -428,6 +428,26 @@ GROUP BY t1.a ORDER BY t1.a;
RESET geqo;
RESET geqo_threshold;
+-- Ensure eager aggregation is not applied because random() is a volatile
+-- function
+EXPLAIN (COSTS OFF)
+SELECT t1.a, avg(t2.c + random())
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+-----------------------------------------------------
+ GroupAggregate
+ Group Key: t1.a
+ -> Sort
+ Sort Key: t1.a
+ -> Hash Join
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on eager_agg_t2 t2
+ -> Hash
+ -> Seq Scan on eager_agg_t1 t1
+(9 rows)
+
DROP TABLE eager_agg_t1;
DROP TABLE eager_agg_t2;
DROP TABLE eager_agg_t3;
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
index abe6d6ae09f..97e10dd7cf4 100644
--- a/src/test/regress/sql/eager_aggregate.sql
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -163,6 +163,14 @@ GROUP BY t1.a ORDER BY t1.a;
RESET geqo;
RESET geqo_threshold;
+-- Ensure eager aggregation is not applied because random() is a volatile
+-- function
+EXPLAIN (COSTS OFF)
+SELECT t1.a, avg(t2.c + random())
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
DROP TABLE eager_agg_t1;
DROP TABLE eager_agg_t2;
DROP TABLE eager_agg_t3;
--
2.39.5 (Apple Git-154)
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2026-04-02 12:18 Matheus Alcantara <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Matheus Alcantara @ 2026-04-02 12:18 UTC (permalink / raw)
To: Richard Guo <[email protected]>; David Rowley <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Mon Mar 30, 2026 at 12:17 AM -03, Richard Guo wrote:
> On Thu, Oct 9, 2025 at 5:07 PM Richard Guo <[email protected]> wrote:
>> I noticed an unnecessary header include in initsplan.c. Will fix that
>> as well.
>
> I noticed a couple of issues that can lead to unexpected results.
> I've attached two patches to fix them.
>
> 1. Eager aggregation was incorrectly checking the data type's default
> collation rather than the expression's actual collation. This allowed
> columns with non-deterministic collations to be pushed down, resulting
> in incorrect grouping. Fixed by 0001.
>
> 2. Pushing aggregates containing volatile functions below a join
> alters their execution count. Fixed by 0002.
>
> (As briefly discussed on Discord, this non-deterministic collation
> issue also exists in our long-existing logic for pushing HAVING down
> to WHERE. But let's fix it for the eager aggregation first.)
>
Hi Richard,
The patches looks good to me and are working as expected. It seems very
straightforward, so I don't have any major comments.
I'm attaching some new tests that I've added to collate.icu.utf8 and
eager_aggregate regression tests during my review, fell free to include
any of them if it could be helpful or none.
--
Matheus Alcantara
EDB: https://www.enterprisedb.com
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index fbcdb7eb58c..a2dd8a34da4 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2726,6 +2726,95 @@ GROUP BY t1.id;
DROP TABLE eager_agg_t1;
DROP TABLE eager_agg_t2;
+--
+-- Test for eager aggregation with multiple columns having different collations
+--
+CREATE TABLE eager_agg_t3 (
+ id int,
+ val1 text COLLATE case_insensitive,
+ val2 text COLLATE "C"
+);
+CREATE TABLE eager_agg_t4 (
+ val1 text COLLATE case_insensitive,
+ val2 text COLLATE "C"
+);
+INSERT INTO eager_agg_t3 SELECT 1, 'a', 'x' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t3 SELECT 1, 'A', 'x' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t4 VALUES ('A', 'x');
+ANALYZE eager_agg_t3;
+ANALYZE eager_agg_t4;
+-- Ensure that eager aggregation is not used when grouping by a column with
+-- non-deterministic collation, even when other grouping columns have
+-- deterministic collations.
+EXPLAIN (COSTS OFF)
+SELECT t1.id, t1.val1, count(*)
+ FROM eager_agg_t3 t1
+ JOIN eager_agg_t4 t2 ON t1.val1 = t2.val1 COLLATE "C" AND t1.val2 = t2.val2
+GROUP BY t1.id, t1.val1;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ HashAggregate
+ Group Key: t1.id, t1.val1
+ -> Nested Loop
+ Join Filter: (((t1.val1)::text = (t2.val1)::text) AND (t1.val2 = t2.val2))
+ -> Seq Scan on eager_agg_t4 t2
+ -> Seq Scan on eager_agg_t3 t1
+(6 rows)
+
+-- Verify correct results (should return 1 row with count = 50)
+SELECT t1.id, t1.val1, count(*)
+ FROM eager_agg_t3 t1
+ JOIN eager_agg_t4 t2 ON t1.val1 = t2.val1 COLLATE "C" AND t1.val2 = t2.val2
+GROUP BY t1.id, t1.val1;
+ id | val1 | count
+----+------+-------
+ 1 | A | 50
+(1 row)
+
+DROP TABLE eager_agg_t3;
+DROP TABLE eager_agg_t4;
+--
+-- Test for eager aggregation with explicit COLLATE on grouping expression
+--
+CREATE TABLE eager_agg_t5 (id int, val text COLLATE "C");
+CREATE TABLE eager_agg_t6 (val text COLLATE "C");
+INSERT INTO eager_agg_t5 SELECT 1, 'a' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t5 SELECT 1, 'A' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t6 VALUES ('A');
+ANALYZE eager_agg_t5;
+ANALYZE eager_agg_t6;
+-- When grouping by an expression with explicit non-deterministic COLLATE,
+-- eager aggregation should not be used even if the column's native collation
+-- is deterministic.
+EXPLAIN (COSTS OFF)
+SELECT t1.id, t1.val COLLATE case_insensitive, count(*)
+ FROM eager_agg_t5 t1
+ JOIN eager_agg_t6 t2 ON t1.val = t2.val
+GROUP BY t1.id, t1.val COLLATE case_insensitive;
+ QUERY PLAN
+-----------------------------------------------
+ HashAggregate
+ Group Key: t1.id, (t1.val)::text
+ -> Hash Join
+ Hash Cond: (t1.val = t2.val)
+ -> Seq Scan on eager_agg_t5 t1
+ -> Hash
+ -> Seq Scan on eager_agg_t6 t2
+(7 rows)
+
+-- Verify correct results (should return 1 row with count = 100, since 'a' and
+-- 'A' are equal under case_insensitive collation)
+SELECT t1.id, t1.val COLLATE case_insensitive, count(*)
+ FROM eager_agg_t5 t1
+ JOIN eager_agg_t6 t2 ON t1.val = t2.val
+GROUP BY t1.id, t1.val COLLATE case_insensitive;
+ id | val | count
+----+-----+-------
+ 1 | A | 50
+(1 row)
+
+DROP TABLE eager_agg_t5;
+DROP TABLE eager_agg_t6;
-- virtual generated columns
CREATE TABLE t5 (
a int,
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
index d1b86be3a62..2bf983d12cb 100644
--- a/src/test/regress/expected/eager_aggregate.out
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -448,6 +448,26 @@ GROUP BY t1.a ORDER BY t1.a;
-> Seq Scan on eager_agg_t1 t1
(9 rows)
+-- Ensure eager aggregation is not applied when FILTER clause contains
+-- volatile function
+EXPLAIN (COSTS OFF)
+SELECT t1.a, avg(t2.c) FILTER (WHERE random() > 0.5)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+-----------------------------------------------------
+ GroupAggregate
+ Group Key: t1.a
+ -> Sort
+ Sort Key: t1.a
+ -> Hash Join
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on eager_agg_t2 t2
+ -> Hash
+ -> Seq Scan on eager_agg_t1 t1
+(9 rows)
+
DROP TABLE eager_agg_t1;
DROP TABLE eager_agg_t2;
DROP TABLE eager_agg_t3;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 0e6b76b11b8..93c22b37727 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -1021,6 +1021,76 @@ GROUP BY t1.id;
DROP TABLE eager_agg_t1;
DROP TABLE eager_agg_t2;
+--
+-- Test for eager aggregation with multiple columns having different collations
+--
+CREATE TABLE eager_agg_t3 (
+ id int,
+ val1 text COLLATE case_insensitive,
+ val2 text COLLATE "C"
+);
+CREATE TABLE eager_agg_t4 (
+ val1 text COLLATE case_insensitive,
+ val2 text COLLATE "C"
+);
+
+INSERT INTO eager_agg_t3 SELECT 1, 'a', 'x' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t3 SELECT 1, 'A', 'x' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t4 VALUES ('A', 'x');
+
+ANALYZE eager_agg_t3;
+ANALYZE eager_agg_t4;
+
+-- Ensure that eager aggregation is not used when grouping by a column with
+-- non-deterministic collation, even when other grouping columns have
+-- deterministic collations.
+EXPLAIN (COSTS OFF)
+SELECT t1.id, t1.val1, count(*)
+ FROM eager_agg_t3 t1
+ JOIN eager_agg_t4 t2 ON t1.val1 = t2.val1 COLLATE "C" AND t1.val2 = t2.val2
+GROUP BY t1.id, t1.val1;
+
+-- Verify correct results (should return 1 row with count = 50)
+SELECT t1.id, t1.val1, count(*)
+ FROM eager_agg_t3 t1
+ JOIN eager_agg_t4 t2 ON t1.val1 = t2.val1 COLLATE "C" AND t1.val2 = t2.val2
+GROUP BY t1.id, t1.val1;
+
+DROP TABLE eager_agg_t3;
+DROP TABLE eager_agg_t4;
+
+--
+-- Test for eager aggregation with explicit COLLATE on grouping expression
+--
+CREATE TABLE eager_agg_t5 (id int, val text COLLATE "C");
+CREATE TABLE eager_agg_t6 (val text COLLATE "C");
+
+INSERT INTO eager_agg_t5 SELECT 1, 'a' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t5 SELECT 1, 'A' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t6 VALUES ('A');
+
+ANALYZE eager_agg_t5;
+ANALYZE eager_agg_t6;
+
+-- When grouping by an expression with explicit non-deterministic COLLATE,
+-- eager aggregation should not be used even if the column's native collation
+-- is deterministic.
+EXPLAIN (COSTS OFF)
+SELECT t1.id, t1.val COLLATE case_insensitive, count(*)
+ FROM eager_agg_t5 t1
+ JOIN eager_agg_t6 t2 ON t1.val = t2.val
+GROUP BY t1.id, t1.val COLLATE case_insensitive;
+
+-- Verify correct results (should return 1 row with count = 100, since 'a' and
+-- 'A' are equal under case_insensitive collation)
+SELECT t1.id, t1.val COLLATE case_insensitive, count(*)
+ FROM eager_agg_t5 t1
+ JOIN eager_agg_t6 t2 ON t1.val = t2.val
+GROUP BY t1.id, t1.val COLLATE case_insensitive;
+
+DROP TABLE eager_agg_t5;
+DROP TABLE eager_agg_t6;
+
-- virtual generated columns
CREATE TABLE t5 (
a int,
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
index 97e10dd7cf4..9c935ef0633 100644
--- a/src/test/regress/sql/eager_aggregate.sql
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -171,6 +171,14 @@ SELECT t1.a, avg(t2.c + random())
JOIN eager_agg_t2 t2 ON t1.b = t2.b
GROUP BY t1.a ORDER BY t1.a;
+-- Ensure eager aggregation is not applied when FILTER clause contains
+-- volatile function
+EXPLAIN (COSTS OFF)
+SELECT t1.a, avg(t2.c) FILTER (WHERE random() > 0.5)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
DROP TABLE eager_agg_t1;
DROP TABLE eager_agg_t2;
DROP TABLE eager_agg_t3;
Attachments:
[text/plain] more-tests.diff.nocfbot (8.0K, 2-more-tests.diff.nocfbot)
download | inline diff:
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index fbcdb7eb58c..a2dd8a34da4 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2726,6 +2726,95 @@ GROUP BY t1.id;
DROP TABLE eager_agg_t1;
DROP TABLE eager_agg_t2;
+--
+-- Test for eager aggregation with multiple columns having different collations
+--
+CREATE TABLE eager_agg_t3 (
+ id int,
+ val1 text COLLATE case_insensitive,
+ val2 text COLLATE "C"
+);
+CREATE TABLE eager_agg_t4 (
+ val1 text COLLATE case_insensitive,
+ val2 text COLLATE "C"
+);
+INSERT INTO eager_agg_t3 SELECT 1, 'a', 'x' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t3 SELECT 1, 'A', 'x' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t4 VALUES ('A', 'x');
+ANALYZE eager_agg_t3;
+ANALYZE eager_agg_t4;
+-- Ensure that eager aggregation is not used when grouping by a column with
+-- non-deterministic collation, even when other grouping columns have
+-- deterministic collations.
+EXPLAIN (COSTS OFF)
+SELECT t1.id, t1.val1, count(*)
+ FROM eager_agg_t3 t1
+ JOIN eager_agg_t4 t2 ON t1.val1 = t2.val1 COLLATE "C" AND t1.val2 = t2.val2
+GROUP BY t1.id, t1.val1;
+ QUERY PLAN
+------------------------------------------------------------------------------------
+ HashAggregate
+ Group Key: t1.id, t1.val1
+ -> Nested Loop
+ Join Filter: (((t1.val1)::text = (t2.val1)::text) AND (t1.val2 = t2.val2))
+ -> Seq Scan on eager_agg_t4 t2
+ -> Seq Scan on eager_agg_t3 t1
+(6 rows)
+
+-- Verify correct results (should return 1 row with count = 50)
+SELECT t1.id, t1.val1, count(*)
+ FROM eager_agg_t3 t1
+ JOIN eager_agg_t4 t2 ON t1.val1 = t2.val1 COLLATE "C" AND t1.val2 = t2.val2
+GROUP BY t1.id, t1.val1;
+ id | val1 | count
+----+------+-------
+ 1 | A | 50
+(1 row)
+
+DROP TABLE eager_agg_t3;
+DROP TABLE eager_agg_t4;
+--
+-- Test for eager aggregation with explicit COLLATE on grouping expression
+--
+CREATE TABLE eager_agg_t5 (id int, val text COLLATE "C");
+CREATE TABLE eager_agg_t6 (val text COLLATE "C");
+INSERT INTO eager_agg_t5 SELECT 1, 'a' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t5 SELECT 1, 'A' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t6 VALUES ('A');
+ANALYZE eager_agg_t5;
+ANALYZE eager_agg_t6;
+-- When grouping by an expression with explicit non-deterministic COLLATE,
+-- eager aggregation should not be used even if the column's native collation
+-- is deterministic.
+EXPLAIN (COSTS OFF)
+SELECT t1.id, t1.val COLLATE case_insensitive, count(*)
+ FROM eager_agg_t5 t1
+ JOIN eager_agg_t6 t2 ON t1.val = t2.val
+GROUP BY t1.id, t1.val COLLATE case_insensitive;
+ QUERY PLAN
+-----------------------------------------------
+ HashAggregate
+ Group Key: t1.id, (t1.val)::text
+ -> Hash Join
+ Hash Cond: (t1.val = t2.val)
+ -> Seq Scan on eager_agg_t5 t1
+ -> Hash
+ -> Seq Scan on eager_agg_t6 t2
+(7 rows)
+
+-- Verify correct results (should return 1 row with count = 100, since 'a' and
+-- 'A' are equal under case_insensitive collation)
+SELECT t1.id, t1.val COLLATE case_insensitive, count(*)
+ FROM eager_agg_t5 t1
+ JOIN eager_agg_t6 t2 ON t1.val = t2.val
+GROUP BY t1.id, t1.val COLLATE case_insensitive;
+ id | val | count
+----+-----+-------
+ 1 | A | 50
+(1 row)
+
+DROP TABLE eager_agg_t5;
+DROP TABLE eager_agg_t6;
-- virtual generated columns
CREATE TABLE t5 (
a int,
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
index d1b86be3a62..2bf983d12cb 100644
--- a/src/test/regress/expected/eager_aggregate.out
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -448,6 +448,26 @@ GROUP BY t1.a ORDER BY t1.a;
-> Seq Scan on eager_agg_t1 t1
(9 rows)
+-- Ensure eager aggregation is not applied when FILTER clause contains
+-- volatile function
+EXPLAIN (COSTS OFF)
+SELECT t1.a, avg(t2.c) FILTER (WHERE random() > 0.5)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+ QUERY PLAN
+-----------------------------------------------------
+ GroupAggregate
+ Group Key: t1.a
+ -> Sort
+ Sort Key: t1.a
+ -> Hash Join
+ Hash Cond: (t2.b = t1.b)
+ -> Seq Scan on eager_agg_t2 t2
+ -> Hash
+ -> Seq Scan on eager_agg_t1 t1
+(9 rows)
+
DROP TABLE eager_agg_t1;
DROP TABLE eager_agg_t2;
DROP TABLE eager_agg_t3;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 0e6b76b11b8..93c22b37727 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -1021,6 +1021,76 @@ GROUP BY t1.id;
DROP TABLE eager_agg_t1;
DROP TABLE eager_agg_t2;
+--
+-- Test for eager aggregation with multiple columns having different collations
+--
+CREATE TABLE eager_agg_t3 (
+ id int,
+ val1 text COLLATE case_insensitive,
+ val2 text COLLATE "C"
+);
+CREATE TABLE eager_agg_t4 (
+ val1 text COLLATE case_insensitive,
+ val2 text COLLATE "C"
+);
+
+INSERT INTO eager_agg_t3 SELECT 1, 'a', 'x' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t3 SELECT 1, 'A', 'x' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t4 VALUES ('A', 'x');
+
+ANALYZE eager_agg_t3;
+ANALYZE eager_agg_t4;
+
+-- Ensure that eager aggregation is not used when grouping by a column with
+-- non-deterministic collation, even when other grouping columns have
+-- deterministic collations.
+EXPLAIN (COSTS OFF)
+SELECT t1.id, t1.val1, count(*)
+ FROM eager_agg_t3 t1
+ JOIN eager_agg_t4 t2 ON t1.val1 = t2.val1 COLLATE "C" AND t1.val2 = t2.val2
+GROUP BY t1.id, t1.val1;
+
+-- Verify correct results (should return 1 row with count = 50)
+SELECT t1.id, t1.val1, count(*)
+ FROM eager_agg_t3 t1
+ JOIN eager_agg_t4 t2 ON t1.val1 = t2.val1 COLLATE "C" AND t1.val2 = t2.val2
+GROUP BY t1.id, t1.val1;
+
+DROP TABLE eager_agg_t3;
+DROP TABLE eager_agg_t4;
+
+--
+-- Test for eager aggregation with explicit COLLATE on grouping expression
+--
+CREATE TABLE eager_agg_t5 (id int, val text COLLATE "C");
+CREATE TABLE eager_agg_t6 (val text COLLATE "C");
+
+INSERT INTO eager_agg_t5 SELECT 1, 'a' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t5 SELECT 1, 'A' FROM generate_series(1, 50);
+INSERT INTO eager_agg_t6 VALUES ('A');
+
+ANALYZE eager_agg_t5;
+ANALYZE eager_agg_t6;
+
+-- When grouping by an expression with explicit non-deterministic COLLATE,
+-- eager aggregation should not be used even if the column's native collation
+-- is deterministic.
+EXPLAIN (COSTS OFF)
+SELECT t1.id, t1.val COLLATE case_insensitive, count(*)
+ FROM eager_agg_t5 t1
+ JOIN eager_agg_t6 t2 ON t1.val = t2.val
+GROUP BY t1.id, t1.val COLLATE case_insensitive;
+
+-- Verify correct results (should return 1 row with count = 100, since 'a' and
+-- 'A' are equal under case_insensitive collation)
+SELECT t1.id, t1.val COLLATE case_insensitive, count(*)
+ FROM eager_agg_t5 t1
+ JOIN eager_agg_t6 t2 ON t1.val = t2.val
+GROUP BY t1.id, t1.val COLLATE case_insensitive;
+
+DROP TABLE eager_agg_t5;
+DROP TABLE eager_agg_t6;
+
-- virtual generated columns
CREATE TABLE t5 (
a int,
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
index 97e10dd7cf4..9c935ef0633 100644
--- a/src/test/regress/sql/eager_aggregate.sql
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -171,6 +171,14 @@ SELECT t1.a, avg(t2.c + random())
JOIN eager_agg_t2 t2 ON t1.b = t2.b
GROUP BY t1.a ORDER BY t1.a;
+-- Ensure eager aggregation is not applied when FILTER clause contains
+-- volatile function
+EXPLAIN (COSTS OFF)
+SELECT t1.a, avg(t2.c) FILTER (WHERE random() > 0.5)
+ FROM eager_agg_t1 t1
+ JOIN eager_agg_t2 t2 ON t1.b = t2.b
+GROUP BY t1.a ORDER BY t1.a;
+
DROP TABLE eager_agg_t1;
DROP TABLE eager_agg_t2;
DROP TABLE eager_agg_t3;
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2026-04-06 03:06 Richard Guo <[email protected]>
parent: Matheus Alcantara <[email protected]>
0 siblings, 0 replies; 55+ messages in thread
From: Richard Guo @ 2026-04-06 03:06 UTC (permalink / raw)
To: Matheus Alcantara <[email protected]>; +Cc: David Rowley <[email protected]>; Robert Haas <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]
On Thu, Apr 2, 2026 at 9:18 PM Matheus Alcantara
<[email protected]> wrote:
> The patches looks good to me and are working as expected. It seems very
> straightforward, so I don't have any major comments.
>
> I'm attaching some new tests that I've added to collate.icu.utf8 and
> eager_aggregate regression tests during my review, fell free to include
> any of them if it could be helpful or none.
Thanks for the review. I have added two of your test cases and
committed the patches.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2026-05-29 15:55 Radim Marek <[email protected]>
parent: Richard Guo <[email protected]>
2 siblings, 1 reply; 55+ messages in thread
From: Radim Marek @ 2026-05-29 15:55 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: pgsql-hackers
Hey Richard,
I might be out of my depth here - but while testing RegreSQL as
correctness/performance harness on PostgreSQL it picked up a problem with
the wrong-results case during eager aggregation.
It reproduces on current HEAD
(commit 2670cc298f42cd7b1c426bf7ccfb0652d8e0b347 now)
with enable_eager_aggregate enabled.
My testing environment
- Linux aarch64, gcc 12 (Debian)
- macOS arm64, Apple clang 21
(PostgreSQL 19devel on aarch64-apple-darwin25.5.0)
== How to reproduce
CREATE TEMP TABLE c(id int, country text);
CREATE TEMP TABLE o(customer_id int);
INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE');
INSERT INTO o VALUES (1),(3); -- only customers 1 and 3 have a row in o
SELECT c.country, count(*) AS n
FROM c
WHERE NOT EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
GROUP BY c.country
ORDER BY c.country;
Expected results (everywhere except master)
country | n
---------+---
DE | 2
US | 1
(2 rows)
The actual result with enable_eager_aggregate = on (default)
country | n
---------+---
DE | 0
US | 0
(2 rows)
With SET enable_eager_aggregate = off, the result is correct (DE=2, US=1),
as it is on PG18.
Query Plan
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=108.19..108.69 rows=200 width=40) (actual time=0.195..0.197
rows=2.00 loops=1)
Sort Key: c.country
Sort Method: quicksort Memory: 25kB
Buffers: local hit=2
-> Finalize HashAggregate (cost=98.55..100.55 rows=200 width=40)
(actual time=0.183..0.186 rows=2.00 loops=1)
Group Key: c.country
Batches: 1 Memory Usage: 32kB
Buffers: local hit=2
-> Hash Anti Join (cost=52.75..95.37 rows=635 width=40) (actual
time=0.177..0.179 rows=3.00 loops=1)
Hash Cond: (c.id = o.customer_id)
Buffers: local hit=2
-> Seq Scan on c (cost=0.00..22.70 rows=1270 width=36)
(actual time=0.024..0.025 rows=5.00 loops=1)
Buffers: local hit=1
-> Hash (cost=50.25..50.25 rows=200 width=12) (actual
time=0.145..0.146 rows=2.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
Buffers: local hit=1
-> Partial HashAggregate (cost=48.25..50.25 rows=200
width=12) (actual time=0.122..0.123 rows=2.00 loops=1)
Group Key: o.customer_id
Batches: 1 Memory Usage: 32kB
Buffers: local hit=1
-> Seq Scan on o (cost=0.00..35.50 rows=2550
width=4) (actual time=0.002..0.003 rows=2.00 loops=1)
Buffers: local hit=1
Planning Time: 0.294 ms
Execution Time: 0.255 ms
(24 rows)
If this is already known or in progress, apologies for the noise.
---
Radim
On Fri, 29 May 2026 at 17:25, Richard Guo <[email protected]> wrote:
> Hi All,
>
> Eager aggregation is a query optimization technique that partially
> pushes a group-by past a join, and finalizes it once all the relations
> are joined. Eager aggregation reduces the number of input rows to the
> join and thus may result in a better overall plan. This technique is
> thoroughly described in the 'Eager Aggregation and Lazy Aggregation'
> paper [1].
>
> Back in 2017, a patch set has been proposed by Antonin Houska to
> implement eager aggregation in thread [2]. However, it was at last
> withdrawn after entering the pattern of "please rebase thx" followed by
> rebasing and getting no feedback until "please rebase again thx". A
> second attempt in 2022 unfortunately fell into the same pattern about
> one year ago and was eventually closed again [3].
>
> That patch set has included most of the necessary concepts to implement
> eager aggregation. However, as far as I can see, it has several weak
> points that we need to address. It introduces invasive changes to some
> core planner functions, such as make_join_rel(). And with such changes
> join_is_legal() would be performed three times for the same proposed
> join, which is not great. Another weak point is that the complexity of
> join searching dramatically increases with the growing number of
> relations to be joined. This occurs because when we generate partially
> aggregated paths, each path of the input relation is considered as an
> input path for the grouped paths. As a result, the number of grouped
> paths we generate increases exponentially, leading to a significant
> explosion in computational complexity. Other weak points include the
> lack of support for outer joins and partitionwise joins. And during my
> review of the code, I came across several bugs (planning error or crash)
> that need to be addressed.
>
> I'd like to give it another take to implement eager aggregation, while
> borrowing lots of concepts and many chunks of codes from the previous
> patch set. Please see attached. I have chosen to use the term 'Eager
> Aggregation' from the paper [1] instead of 'Aggregation push-down', to
> differentiate the aggregation push-down technique in FDW.
>
> The patch has been split into small pieces to make the review easier.
>
> 0001 introduces the RelInfoList structure, which encapsulates both a
> list and a hash table, so that we can leverage the hash table for faster
> lookups not only for join relations but also for upper relations. With
> eager aggregation, it is possible that we generate so many upper rels of
> type UPPERREL_PARTIAL_GROUP_AGG that a hash table can help a lot with
> lookups.
>
> 0002 introduces the RelAggInfo structure to store information needed to
> create grouped paths for base and join rels. It also revises the
> RelInfoList related structures and functions so that they can be used
> with RelAggInfos.
>
> 0003 checks if eager aggregation is applicable, and if so, collects
> suitable aggregate expressions and grouping expressions in the query,
> and records them in root->agg_clause_list and root->group_expr_list
> respectively.
>
> 0004 implements the functions that check if eager aggregation is
> applicable for a given relation, and if so, create RelAggInfo structure
> for the relation, using the infos about aggregate expressions and
> grouping expressions we collected earlier. In this patch, when we check
> if a target expression can act as grouping expression, we need to check
> if this expression can be known equal to other expressions due to ECs
> that can act as grouping expressions. This patch leverages function
> exprs_known_equal() to achieve that, after enhancing this function to
> consider opfamily if provided.
>
> 0005 implements the functions that generate paths for grouped relations
> by adding sorted and hashed partial aggregation paths on top of paths of
> the plain base or join relations. For sorted partial aggregation paths,
> we only consider any suitably-sorted input paths as well as sorting the
> cheapest-total path. For hashed partial aggregation paths, we only
> consider the cheapest-total path as input. By not considering other
> paths we can reduce the number of grouping paths as much as possible
> while still achieving reasonable results.
>
> 0006 builds grouped relations for each base relation if possible, and
> generates aggregation paths for the grouped base relations.
>
> 0007 builds grouped relations for each just-processed join relation if
> possible, and generates aggregation paths for the grouped join
> relations. The changes made to make_join_rel() are relatively minor,
> with the addition of a new function make_grouped_join_rel(), which finds
> or creates a grouped relation for the just-processed joinrel, and
> generates grouped paths by joining a grouped input relation with a
> non-grouped input relation.
>
> The other way to generate grouped paths is by adding sorted and hashed
> partial aggregation paths on top of paths of the joinrel. This occurs
> in standard_join_search(), after we've run set_cheapest() for the
> joinrel. The reason for performing this step after set_cheapest() is
> that we need to know the joinrel's cheapest paths (see 0005).
>
> This patch also makes the grouped relation for the topmost join rel act
> as the upper rel representing the result of partial aggregation, so that
> we can add the final aggregation on top of that. Additionally, this
> patch extends the functionality of eager aggregation to work with
> partitionwise join and geqo.
>
> This patch also makes eager aggregation work with outer joins. With
> outer join, the aggregate cannot be pushed down if any column referenced
> by grouping expressions or aggregate functions is nullable by an outer
> join above the relation to which we want to apply the partiall
> aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we
> can easily identify such situations and subsequently refrain from
> pushing down the aggregates.
>
> Starting from this patch, you should be able to see plans with eager
> aggregation.
>
> 0008 adds test cases for eager aggregation.
>
> 0009 adds a section in README that describes this feature (copied from
> previous patch set, with minor tweaks).
>
> Thoughts and comments are welcome.
>
> [1] https://www.vldb.org/conf/1995/P345.PDF
> [2] https://www.postgresql.org/message-id/flat/9666.1491295317%40localhost
> [3]
> https://www.postgresql.org/message-id/flat/OS3PR01MB66609589B896FBDE190209F495EE9%40OS3PR01MB6660.jp...
>
> Thanks
> Richard
>
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2026-05-31 11:28 Tender Wang <[email protected]>
parent: Radim Marek <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Tender Wang @ 2026-05-31 11:28 UTC (permalink / raw)
To: Radim Marek <[email protected]>; +Cc: Richard Guo <[email protected]>; pgsql-hackers
Radim Marek <[email protected]> 于2026年5月29日周五 23:55写道:
>
> Hey Richard,
>
> I might be out of my depth here - but while testing RegreSQL as correctness/performance harness on PostgreSQL it picked up a problem with the wrong-results case during eager aggregation.
>
> It reproduces on current HEAD (commit 2670cc298f42cd7b1c426bf7ccfb0652d8e0b347 now) with enable_eager_aggregate enabled.
>
> My testing environment
> - Linux aarch64, gcc 12 (Debian)
> - macOS arm64, Apple clang 21
> (PostgreSQL 19devel on aarch64-apple-darwin25.5.0)
>
> == How to reproduce
>
> CREATE TEMP TABLE c(id int, country text);
> CREATE TEMP TABLE o(customer_id int);
> INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE');
> INSERT INTO o VALUES (1),(3); -- only customers 1 and 3 have a row in o
>
> SELECT c.country, count(*) AS n
> FROM c
> WHERE NOT EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
> GROUP BY c.country
> ORDER BY c.country;
>
> Expected results (everywhere except master)
>
> country | n
> ---------+---
> DE | 2
> US | 1
> (2 rows)
>
> The actual result with enable_eager_aggregate = on (default)
>
> country | n
> ---------+---
> DE | 0
> US | 0
> (2 rows)
>
> With SET enable_eager_aggregate = off, the result is correct (DE=2, US=1), as it is on PG18.
>
> Query Plan
>
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------------------
> Sort (cost=108.19..108.69 rows=200 width=40) (actual time=0.195..0.197 rows=2.00 loops=1)
> Sort Key: c.country
> Sort Method: quicksort Memory: 25kB
> Buffers: local hit=2
> -> Finalize HashAggregate (cost=98.55..100.55 rows=200 width=40) (actual time=0.183..0.186 rows=2.00 loops=1)
> Group Key: c.country
> Batches: 1 Memory Usage: 32kB
> Buffers: local hit=2
> -> Hash Anti Join (cost=52.75..95.37 rows=635 width=40) (actual time=0.177..0.179 rows=3.00 loops=1)
> Hash Cond: (c.id = o.customer_id)
> Buffers: local hit=2
> -> Seq Scan on c (cost=0.00..22.70 rows=1270 width=36) (actual time=0.024..0.025 rows=5.00 loops=1)
> Buffers: local hit=1
> -> Hash (cost=50.25..50.25 rows=200 width=12) (actual time=0.145..0.146 rows=2.00 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 9kB
> Buffers: local hit=1
> -> Partial HashAggregate (cost=48.25..50.25 rows=200 width=12) (actual time=0.122..0.123 rows=2.00 loops=1)
> Group Key: o.customer_id
> Batches: 1 Memory Usage: 32kB
> Buffers: local hit=1
> -> Seq Scan on o (cost=0.00..35.50 rows=2550 width=4) (actual time=0.002..0.003 rows=2.00 loops=1)
> Buffers: local hit=1
> Planning Time: 0.294 ms
> Execution Time: 0.255 ms
> (24 rows)
>
> If this is already known or in progress, apologies for the noise.
Thanks for the report. This is a bug.
When we use eager_agg, it can reduce many tuples before doing a join
on the partial agg side.
After partial agg, when we are doing a join, the matched rows will be
significantly reduced.
This is also the effect we want to achieve from eager_agg.
But we should be careful about anti-join. Because we will ignore the
matched row. The aggregate of unmatched rows seems wrong.
And I can get the wrong results from the semi-join, too.
For example:
postgres=# CREATE TEMP TABLE c(id int, country text);
CREATE TEMP TABLE o(customer_id int);
INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE');
INSERT INTO o VALUES (1),(3);
CREATE TABLE
CREATE TABLE
INSERT 0 5
INSERT 0 2
postgres=# insert into o values (1);
INSERT 0 1
-- correct result
postgres=# SELECT c.country, count(*) AS n
FROM c
WHERE EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
GROUP BY c.country
ORDER BY c.country;
country | n
---------+---
DE | 1
US | 1
(2 rows)
I do some hacks that make the cost of the path created in
make_grouped_join_rel() very small.
So we can get a partial agg plan, as follow:
postgres=# explain SELECT c.country, count(*) AS n
FROM c
WHERE EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
GROUP BY c.country
ORDER BY c.country;
QUERY PLAN
-----------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=31.56..38.32 rows=200 width=40)
Group Key: c.country
-> Sort (cost=31.56..33.15 rows=635 width=40)
Sort Key: c.country
-> Hash Semi Join (cost=1.00..2.00 rows=635 width=40)
Hash Cond: (c.id = o.customer_id)
-> Seq Scan on c (cost=0.00..22.70 rows=1270 width=36)
-> Hash (cost=200.91..200.91 rows=200 width=12)
-> Partial GroupAggregate (cost=179.78..200.91
rows=200 width=12)
Group Key: o.customer_id
-> Sort (cost=179.78..186.16 rows=2550 width=4)
Sort Key: o.customer_id
-> Seq Scan on o (cost=0.00..35.50
rows=2550 width=4)
(13 rows)
postgres=# SELECT c.country, count(*) AS n
FROM c
WHERE EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
GROUP BY c.country
ORDER BY c.country;
country | n
---------+---
DE | 1
US | 2
(2 rows)
You can see that the count(us) has 2. Because partial agg
pre-aggregates the results for country =1.
However, for the semantics of semi-join, it returns once a match is found.
I haven't thought about it too deeply yet. Maybe we can do something
in the make_grouped_join_rel().
...
if (sjinfo->jointype == JOIN_ANTI || sjinfo->jointype == JOIN_SEMI)
return;
...
The fixes above can temporarily resolve these issues. But it seems too strict.
--
Thanks,
Tender Wang
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2026-06-01 07:19 Richard Guo <[email protected]>
parent: Tender Wang <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Richard Guo @ 2026-06-01 07:19 UTC (permalink / raw)
To: Tender Wang <[email protected]>; +Cc: Radim Marek <[email protected]>; pgsql-hackers
On Sun, May 31, 2026 at 8:28 PM Tender Wang <[email protected]> wrote:
> Radim Marek <[email protected]> 于2026年5月29日周五 23:55写道:
> > == How to reproduce
> >
> > CREATE TEMP TABLE c(id int, country text);
> > CREATE TEMP TABLE o(customer_id int);
> > INSERT INTO c VALUES (1,'US'),(2,'US'),(3,'DE'),(4,'DE'),(5,'DE');
> > INSERT INTO o VALUES (1),(3); -- only customers 1 and 3 have a row in o
> >
> > SELECT c.country, count(*) AS n
> > FROM c
> > WHERE NOT EXISTS (SELECT 1 FROM o WHERE o.customer_id = c.id)
> > GROUP BY c.country
> > ORDER BY c.country;
> >
> > Expected results (everywhere except master)
> >
> > country | n
> > ---------+---
> > DE | 2
> > US | 1
> > (2 rows)
> >
> > The actual result with enable_eager_aggregate = on (default)
> >
> > country | n
> > ---------+---
> > DE | 0
> > US | 0
> > (2 rows)
Thanks for the report. This is a bug. We should never push a partial
aggregation down to a relation on the inner (RHS) side of a semi/anti
join. A semi/anti join does not preserve its inner rows in the join
output, so a partial aggregate computed on the inner side would not
survive the join and could not be combined by the final aggregation.
> I haven't thought about it too deeply yet. Maybe we can do something
> in the make_grouped_join_rel().
> ...
> if (sjinfo->jointype == JOIN_ANTI || sjinfo->jointype == JOIN_SEMI)
> return;
> ...
That does fix the reported case, but I think it's too broad: it also
disables pushing a partial aggregate to the outer side of a semi/anti
join, which is valid. And by the time we reach make_grouped_join_rel
the grouped relation for the inner-side relation has already been
built, so it would just go unused.
So I'd rather fix it in eager_aggregation_possible_for_relation, right
next to the existing outer-join check, by rejecting a relation that
lies on the inner side of a semijoin/antijoin. See attached.
- Richard
Attachments:
[application/octet-stream] v1-0001-Fix-eager-aggregation-for-semi-antijoin-inner-rel.patch (8.9K, 2-v1-0001-Fix-eager-aggregation-for-semi-antijoin-inner-rel.patch)
download | inline diff:
From f243991b43614e56ef2be0bd5f0c92f807cdde3e Mon Sep 17 00:00:00 2001
From: Richard Guo <[email protected]>
Date: Mon, 1 Jun 2026 14:49:33 +0900
Subject: [PATCH v1] Fix eager aggregation for semi/antijoin inner rels
Eager aggregation pushes a partial aggregate down to a base or join
relation, to be finalized after that relation is joined with the rest
of the query. eager_aggregation_possible_for_relation() already
refuses to do this for a relation on the nullable side of an outer
join, but it failed to also refuse it for a relation on the inner side
of a semijoin or antijoin.
Such a join does not emit its inner rows, so a partial aggregate
computed on the inner side does not survive the join and cannot be
combined by the final aggregation. This can happen only for an
aggregate that references no table column, such as count(*): it is
considered computable on any relation, including the inner one,
whereas an aggregate that references a column is anchored to the outer
side and never reaches the inner relation.
The existing outer-join check did not catch this because it consults
nulling_relids, which only tracks joins that null-extend their inner
side. Semijoins and antijoins formed from EXISTS, IN, NOT EXISTS, or
NOT IN sublinks do not null-extend and carry no ojrelid, so they are
invisible to that check.
Fix by additionally rejecting any relation that includes inner-side
relations of a semijoin or antijoin but not the join's outer side.
Pushing a partial aggregate to the outer side of such a join, grouped
by the join key, remains valid and is still allowed.
---
src/backend/optimizer/README | 11 +++
src/backend/optimizer/util/relnode.c | 26 ++++++
src/test/regress/expected/eager_aggregate.out | 90 +++++++++++++++++++
src/test/regress/sql/eager_aggregate.sql | 26 ++++++
4 files changed, 153 insertions(+)
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index 6c35baceedb..78a307cc523 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -1588,6 +1588,17 @@ aggregation. Pushing partial aggregation in this case may result in
the rows being grouped differently than expected, or produce incorrect
values from the aggregate functions.
+Semi joins and anti joins impose a similar restriction. Such a join
+does not preserve its inner rows in the join output, so a partial
+aggregate computed on the inner side would not survive the join and
+could not be combined by the final aggregation. We therefore do not
+push partial aggregation down to the inner side of a semi/anti join.
+(An anti join reduced from an outer join null-extends its inner side,
+so that inner relation is already excluded by the outer-join condition
+above; the case specifically addressed here is a semi/anti join that
+does not null-extend its inner side, such as one formed from an
+EXISTS, IN, NOT EXISTS, or NOT IN sublink.)
+
During the construction of the join tree, we evaluate each base or
join relation to determine if eager aggregation can be applied. If
feasible, we create a separate RelOptInfo called a "grouped relation"
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 3fc2c2f71d0..687e923c46c 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -2845,6 +2845,32 @@ eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel)
return false;
}
+ /*
+ * Similarly, we cannot push a partial aggregation down to a relation on
+ * the inner (RHS) side of a semi/anti join. A semi/anti join does not
+ * preserve its inner rows in the join output, so a partial aggregate
+ * computed on the inner side would not survive the join and could not be
+ * combined by the final aggregation.
+ *
+ * Note that an anti join reduced from an outer join null-extends its
+ * inner side, so that inner relation already carries nulling_relids and
+ * is handled by the outer-join check above. The case this check adds is
+ * a semi/anti join that does not null-extend its inner side, such as one
+ * formed from an EXISTS, IN, NOT EXISTS, or NOT IN sublink.
+ */
+ foreach(lc, root->join_info_list)
+ {
+ SpecialJoinInfo *sjinfo = lfirst_node(SpecialJoinInfo, lc);
+
+ if (sjinfo->jointype != JOIN_SEMI && sjinfo->jointype != JOIN_ANTI)
+ continue;
+
+ /* rel includes inner-side rels of this join but not its outer side */
+ if (bms_overlap(rel->relids, sjinfo->min_righthand) &&
+ !bms_is_subset(sjinfo->min_lefthand, rel->relids))
+ return false;
+ }
+
/*
* For now we don't try to support PlaceHolderVars.
*/
diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out
index 456d32eb13d..091ae48a92b 100644
--- a/src/test/regress/expected/eager_aggregate.out
+++ b/src/test/regress/expected/eager_aggregate.out
@@ -466,6 +466,96 @@ GROUP BY t1.a ORDER BY t1.a;
-> Seq Scan on eager_agg_t1 t1
(9 rows)
+-- Eager aggregation must not push a partial aggregate onto the inner side of a
+-- SEMI or ANTI join
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, count(*)
+ FROM eager_agg_t2 t2
+ WHERE NOT EXISTS (SELECT 1 FROM eager_agg_t3 t3 WHERE t3.a = t2.a)
+GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------
+ Sort
+ Output: t2.b, (count(*))
+ Sort Key: t2.b
+ -> HashAggregate
+ Output: t2.b, count(*)
+ Group Key: t2.b
+ -> Hash Anti Join
+ Output: t2.b
+ Hash Cond: (t2.a = t3.a)
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+ -> Hash
+ Output: t3.a
+ -> Seq Scan on public.eager_agg_t3 t3
+ Output: t3.a
+(15 rows)
+
+SELECT t2.b, count(*)
+ FROM eager_agg_t2 t2
+ WHERE NOT EXISTS (SELECT 1 FROM eager_agg_t3 t3 WHERE t3.a = t2.a)
+GROUP BY t2.b ORDER BY t2.b;
+ b | count
+---+-------
+ 0 | 100
+ 1 | 99
+ 2 | 99
+ 3 | 99
+ 4 | 99
+ 5 | 99
+ 6 | 99
+ 7 | 99
+ 8 | 99
+ 9 | 99
+(10 rows)
+
+-- Eager aggregation may still push a partial aggregate onto the outer side of
+-- a SEMI or ANTI join
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, count(*)
+ FROM eager_agg_t2 t2
+ WHERE EXISTS (SELECT 1 FROM eager_agg_t1 t1 WHERE t1.b = t2.b)
+GROUP BY t2.b ORDER BY t2.b;
+ QUERY PLAN
+------------------------------------------------------------------
+ Finalize GroupAggregate
+ Output: t2.b, count(*)
+ Group Key: t2.b
+ -> Sort
+ Output: t2.b, (PARTIAL count(*))
+ Sort Key: t2.b
+ -> Hash Right Semi Join
+ Output: t2.b, (PARTIAL count(*))
+ Hash Cond: (t1.b = t2.b)
+ -> Seq Scan on public.eager_agg_t1 t1
+ Output: t1.a, t1.b, t1.c
+ -> Hash
+ Output: t2.b, (PARTIAL count(*))
+ -> Partial HashAggregate
+ Output: t2.b, PARTIAL count(*)
+ Group Key: t2.b
+ -> Seq Scan on public.eager_agg_t2 t2
+ Output: t2.a, t2.b, t2.c
+(18 rows)
+
+SELECT t2.b, count(*)
+ FROM eager_agg_t2 t2
+ WHERE EXISTS (SELECT 1 FROM eager_agg_t1 t1 WHERE t1.b = t2.b)
+GROUP BY t2.b ORDER BY t2.b;
+ b | count
+---+-------
+ 1 | 100
+ 2 | 100
+ 3 | 100
+ 4 | 100
+ 5 | 100
+ 6 | 100
+ 7 | 100
+ 8 | 100
+ 9 | 100
+(9 rows)
+
DROP TABLE eager_agg_t1;
DROP TABLE eager_agg_t2;
DROP TABLE eager_agg_t3;
diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql
index 53d9b377a64..7bca9c524da 100644
--- a/src/test/regress/sql/eager_aggregate.sql
+++ b/src/test/regress/sql/eager_aggregate.sql
@@ -177,6 +177,32 @@ SELECT t1.a, avg(t2.c) FILTER (WHERE random() > 0.5)
JOIN eager_agg_t2 t2 ON t1.b = t2.b
GROUP BY t1.a ORDER BY t1.a;
+-- Eager aggregation must not push a partial aggregate onto the inner side of a
+-- SEMI or ANTI join
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, count(*)
+ FROM eager_agg_t2 t2
+ WHERE NOT EXISTS (SELECT 1 FROM eager_agg_t3 t3 WHERE t3.a = t2.a)
+GROUP BY t2.b ORDER BY t2.b;
+
+SELECT t2.b, count(*)
+ FROM eager_agg_t2 t2
+ WHERE NOT EXISTS (SELECT 1 FROM eager_agg_t3 t3 WHERE t3.a = t2.a)
+GROUP BY t2.b ORDER BY t2.b;
+
+-- Eager aggregation may still push a partial aggregate onto the outer side of
+-- a SEMI or ANTI join
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT t2.b, count(*)
+ FROM eager_agg_t2 t2
+ WHERE EXISTS (SELECT 1 FROM eager_agg_t1 t1 WHERE t1.b = t2.b)
+GROUP BY t2.b ORDER BY t2.b;
+
+SELECT t2.b, count(*)
+ FROM eager_agg_t2 t2
+ WHERE EXISTS (SELECT 1 FROM eager_agg_t1 t1 WHERE t1.b = t2.b)
+GROUP BY t2.b ORDER BY t2.b;
+
DROP TABLE eager_agg_t1;
DROP TABLE eager_agg_t2;
DROP TABLE eager_agg_t3;
--
2.39.5 (Apple Git-154)
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2026-06-01 07:57 Tender Wang <[email protected]>
parent: Richard Guo <[email protected]>
0 siblings, 1 reply; 55+ messages in thread
From: Tender Wang @ 2026-06-01 07:57 UTC (permalink / raw)
To: Richard Guo <[email protected]>; +Cc: Radim Marek <[email protected]>; pgsql-hackers
Richard Guo <[email protected]> 于2026年6月1日周一 15:19写道:
>
> Thanks for the report. This is a bug. We should never push a partial
> aggregation down to a relation on the inner (RHS) side of a semi/anti
> join. A semi/anti join does not preserve its inner rows in the join
> output, so a partial aggregate computed on the inner side would not
> survive the join and could not be combined by the final aggregation.
>
> > I haven't thought about it too deeply yet. Maybe we can do something
> > in the make_grouped_join_rel().
> > ...
> > if (sjinfo->jointype == JOIN_ANTI || sjinfo->jointype == JOIN_SEMI)
> > return;
> > ...
>
> That does fix the reported case, but I think it's too broad: it also
> disables pushing a partial aggregate to the outer side of a semi/anti
> join, which is valid. And by the time we reach make_grouped_join_rel
> the grouped relation for the inner-side relation has already been
> built, so it would just go unused.
Yes, checking only the jointype and concluding that partial agg is not allowed
will cause us to miss some optimization opportunities for the outer
side of the semi or anti-join.
> So I'd rather fix it in eager_aggregation_possible_for_relation, right
> next to the existing outer-join check, by rejecting a relation that
> lies on the inner side of a semijoin/antijoin. See attached.
The attached LGTM.
--
Thanks,
Tender Wang
^ permalink raw reply [nested|flat] 55+ messages in thread
* Re: Eager aggregation, take 3
@ 2026-06-03 01:25 Richard Guo <[email protected]>
parent: Tender Wang <[email protected]>
0 siblings, 0 replies; 55+ messages in thread
From: Richard Guo @ 2026-06-03 01:25 UTC (permalink / raw)
To: Tender Wang <[email protected]>; +Cc: Radim Marek <[email protected]>; pgsql-hackers
On Mon, Jun 1, 2026 at 4:57 PM Tender Wang <[email protected]> wrote:
> Richard Guo <[email protected]> 于2026年6月1日周一 15:19写道:
> > So I'd rather fix it in eager_aggregation_possible_for_relation, right
> > next to the existing outer-join check, by rejecting a relation that
> > lies on the inner side of a semijoin/antijoin. See attached.
> The attached LGTM.
Thanks for the review. Pushed.
And thanks to Radim for the report and the well-contained repro.
- Richard
^ permalink raw reply [nested|flat] 55+ messages in thread
end of thread, other threads:[~2026-06-03 01:25 UTC | newest]
Thread overview: 55+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-03-04 08:27 Eager aggregation, take 3 Richard Guo <[email protected]>
2025-06-13 07:41 ` Richard Guo <[email protected]>
2025-06-26 02:01 ` Richard Guo <[email protected]>
2025-07-24 03:21 ` Richard Guo <[email protected]>
2025-08-06 07:52 ` Richard Guo <[email protected]>
2025-08-06 13:44 ` Matheus Alcantara <[email protected]>
2025-08-09 01:32 ` Richard Guo <[email protected]>
2025-08-14 19:22 ` Matheus Alcantara <[email protected]>
2025-08-15 01:41 ` Richard Guo <[email protected]>
2025-09-01 01:32 ` Richard Guo <[email protected]>
2025-09-05 07:35 ` Richard Guo <[email protected]>
2025-09-05 14:37 ` Robert Haas <[email protected]>
2025-09-09 10:30 ` Richard Guo <[email protected]>
2025-09-09 14:30 ` Robert Haas <[email protected]>
2025-09-05 13:12 ` Robert Haas <[email protected]>
2025-09-09 09:20 ` Richard Guo <[email protected]>
2025-09-09 14:20 ` Robert Haas <[email protected]>
2025-09-12 09:34 ` Richard Guo <[email protected]>
2025-09-12 18:47 ` Robert Haas <[email protected]>
2025-09-13 08:27 ` Richard Guo <[email protected]>
2025-09-25 04:23 ` Richard Guo <[email protected]>
2025-09-29 02:09 ` Richard Guo <[email protected]>
2025-10-01 23:54 ` Matheus Alcantara <[email protected]>
2025-10-02 01:13 ` Richard Guo <[email protected]>
2025-10-02 01:39 ` Richard Guo <[email protected]>
2025-10-02 08:49 ` Richard Guo <[email protected]>
2025-10-02 18:40 ` Matheus Alcantara <[email protected]>
2025-10-03 03:14 ` Richard Guo <[email protected]>
2025-10-03 20:03 ` Matheus Alcantara <[email protected]>
2025-10-06 00:56 ` Richard Guo <[email protected]>
2025-10-06 00:59 ` Richard Guo <[email protected]>
2025-10-06 13:59 ` David Rowley <[email protected]>
2025-10-07 10:56 ` Richard Guo <[email protected]>
2025-10-08 11:14 ` David Rowley <[email protected]>
2025-10-09 01:49 ` Richard Guo <[email protected]>
2025-10-09 08:07 ` Richard Guo <[email protected]>
2026-03-30 03:17 ` Richard Guo <[email protected]>
2026-04-02 12:18 ` Matheus Alcantara <[email protected]>
2026-04-06 03:06 ` Richard Guo <[email protected]>
2025-10-08 14:45 ` Robert Haas <[email protected]>
2025-10-09 01:51 ` Richard Guo <[email protected]>
2025-10-09 01:48 ` Richard Guo <[email protected]>
2025-10-09 05:09 ` Antonin Houska <[email protected]>
2025-10-09 07:01 ` Richard Guo <[email protected]>
2025-09-05 14:50 ` Robert Haas <[email protected]>
2025-09-09 11:18 ` Richard Guo <[email protected]>
2025-09-05 13:09 ` Robert Haas <[email protected]>
2025-09-09 09:07 ` Richard Guo <[email protected]>
2025-10-09 02:13 ` Tom Lane <[email protected]>
2025-10-09 03:10 ` Richard Guo <[email protected]>
2026-05-29 15:55 ` Radim Marek <[email protected]>
2026-05-31 11:28 ` Tender Wang <[email protected]>
2026-06-01 07:19 ` Richard Guo <[email protected]>
2026-06-01 07:57 ` Tender Wang <[email protected]>
2026-06-03 01:25 ` Richard Guo <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox