public inbox for [email protected]help / color / mirror / Atom feed
Re: Eager aggregation, take 3 30+ messages / 5 participants [nested] [flat]
* Re: Eager aggregation, take 3 @ 2024-07-03 08:29 Richard Guo <[email protected]> 0 siblings, 1 reply; 30+ messages in thread From: Richard Guo @ 2024-07-03 08:29 UTC (permalink / raw) To: Andy Fan <[email protected]>; +Cc: pgsql-hackers; [email protected] On Thu, Jun 13, 2024 at 4:07 PM Richard Guo <[email protected]> wrote: > I spent some time testing this patchset and found a few more issues. > ... > Hence here is the v8 patchset, with fixes for all the above issues. I found an 'ORDER/GROUP BY expression not found in targetlist' error with this patchset, with the query below: create table t (a boolean); set enable_eager_aggregate to on; explain (costs off) select min(1) from t t1 left join t t2 on t1.a group by (not (not t1.a)), t1.a order by t1.a; ERROR: ORDER/GROUP BY expression not found in targetlist This happens because the two grouping items are actually the same and standard_qp_callback would remove one of them. The fully-processed groupClause is kept in root->processed_groupClause. However, when collecting grouping expressions in create_grouping_expr_infos, we are checking parse->groupClause, which is incorrect. The fix is straightforward: check root->processed_groupClause instead. Here is a new rebase with this fix. Thanks Richard Attachments: [application/octet-stream] v9-0001-Introduce-RelInfoList-structure.patch (14.3K, 2-v9-0001-Introduce-RelInfoList-structure.patch) download | inline diff: From af0a498a243684478c2b08d9cb1dcf2d5a979a93 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 15:59:19 +0900 Subject: [PATCH v9 01/10] Introduce RelInfoList structure This commit introduces the RelInfoList structure, which encapsulates both a list and a hash table, so that we can leverage the hash table for faster lookups not only for join relations but also for upper relations. --- contrib/postgres_fdw/postgres_fdw.c | 3 +- src/backend/optimizer/geqo/geqo_eval.c | 20 +-- src/backend/optimizer/path/allpaths.c | 7 +- src/backend/optimizer/plan/planmain.c | 5 +- src/backend/optimizer/util/relnode.c | 164 ++++++++++++++----------- src/include/nodes/pathnodes.h | 31 +++-- 6 files changed, 133 insertions(+), 97 deletions(-) diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 0bb9a5ae8f..e82e1bb558 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -6069,7 +6069,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype, */ Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */ fpinfo->relation_index = - list_length(root->parse->rtable) + list_length(root->join_rel_list); + list_length(root->parse->rtable) + + list_length(root->join_rel_list->items); return true; } diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c index d2f7f4e5f3..1141156899 100644 --- a/src/backend/optimizer/geqo/geqo_eval.c +++ b/src/backend/optimizer/geqo/geqo_eval.c @@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) * truncating the list to its original length. NOTE this assumes that any * added entries are appended at the end! * - * We also must take care not to mess up the outer join_rel_hash, if there - * is one. We can do this by just temporarily setting the link to NULL. - * (If we are dealing with enough join rels, which we very likely are, a - * new hash table will get built and used locally.) + * We also must take care not to mess up the outer join_rel_list->hash, if + * there is one. We can do this by just temporarily setting the link to + * NULL. (If we are dealing with enough join rels, which we very likely + * are, a new hash table will get built and used locally.) * * join_rel_level[] shouldn't be in use, so just Assert it isn't. */ - savelength = list_length(root->join_rel_list); - savehash = root->join_rel_hash; + savelength = list_length(root->join_rel_list->items); + savehash = root->join_rel_list->hash; Assert(root->join_rel_level == NULL); - root->join_rel_hash = NULL; + root->join_rel_list->hash = NULL; /* construct the best path for the given combination of relations */ joinrel = gimme_tree(root, tour, num_gene); @@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) * Restore join_rel_list to its former state, and put back original * hashtable if any. */ - root->join_rel_list = list_truncate(root->join_rel_list, - savelength); - root->join_rel_hash = savehash; + root->join_rel_list->items = list_truncate(root->join_rel_list->items, + savelength); + root->join_rel_list->hash = savehash; /* release all the memory acquired within gimme_tree */ MemoryContextSwitchTo(oldcxt); diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 4895cee994..70e2b58d8f 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -3403,9 +3403,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist) * needed for these paths need have been instantiated. * * Note to plugin authors: the functions invoked during standard_join_search() - * modify root->join_rel_list and root->join_rel_hash. If you want to do more - * than one join-order search, you'll probably need to save and restore the - * original states of those data structures. See geqo_eval() for an example. + * modify root->join_rel_list->items and root->join_rel_list->hash. If you + * want to do more than one join-order search, you'll probably need to save and + * restore the original states of those data structures. See geqo_eval() for + * an example. */ RelOptInfo * standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index e17d31a5c3..fd8b2b0ca3 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -64,8 +64,9 @@ query_planner(PlannerInfo *root, * NOTE: append_rel_list was set up by subquery_planner, so do not touch * here. */ - root->join_rel_list = NIL; - root->join_rel_hash = NULL; + root->join_rel_list = makeNode(RelInfoList); + root->join_rel_list->items = NIL; + root->join_rel_list->hash = NULL; root->join_rel_level = NULL; root->join_cur_level = 0; root->canon_pathkeys = NIL; diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index e05b21c884..8279ab0e11 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -35,11 +35,15 @@ #include "utils/lsyscache.h" -typedef struct JoinHashEntry +/* + * An entry of a hash table that we use to make lookup for RelOptInfo + * structures more efficient. + */ +typedef struct RelInfoEntry { - Relids join_relids; /* hash key --- MUST BE FIRST */ - RelOptInfo *join_rel; -} JoinHashEntry; + Relids relids; /* hash key --- MUST BE FIRST */ + RelOptInfo *rel; +} RelInfoEntry; static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *input_rel, @@ -479,11 +483,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid) } /* - * build_join_rel_hash - * Construct the auxiliary hash table for join relations. + * build_rel_hash + * Construct the auxiliary hash table for relations. */ static void -build_join_rel_hash(PlannerInfo *root) +build_rel_hash(RelInfoList *list) { HTAB *hashtab; HASHCTL hash_ctl; @@ -491,47 +495,49 @@ build_join_rel_hash(PlannerInfo *root) /* Create the hash table */ hash_ctl.keysize = sizeof(Relids); - hash_ctl.entrysize = sizeof(JoinHashEntry); + hash_ctl.entrysize = sizeof(RelInfoEntry); hash_ctl.hash = bitmap_hash; hash_ctl.match = bitmap_match; hash_ctl.hcxt = CurrentMemoryContext; - hashtab = hash_create("JoinRelHashTable", + hashtab = hash_create("RelHashTable", 256L, &hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); - /* Insert all the already-existing joinrels */ - foreach(l, root->join_rel_list) + /* Insert all the already-existing relations */ + foreach(l, list->items) { RelOptInfo *rel = (RelOptInfo *) lfirst(l); - JoinHashEntry *hentry; + RelInfoEntry *hentry; bool found; - hentry = (JoinHashEntry *) hash_search(hashtab, - &(rel->relids), - HASH_ENTER, - &found); + hentry = (RelInfoEntry *) hash_search(hashtab, + &(rel->relids), + HASH_ENTER, + &found); Assert(!found); - hentry->join_rel = rel; + hentry->rel = rel; } - root->join_rel_hash = hashtab; + list->hash = hashtab; } /* - * find_join_rel - * Returns relation entry corresponding to 'relids' (a set of RT indexes), - * or NULL if none exists. This is for join relations. + * find_rel_info + * Find an RelOptInfo entry. */ -RelOptInfo * -find_join_rel(PlannerInfo *root, Relids relids) +static RelOptInfo * +find_rel_info(RelInfoList *list, Relids relids) { + if (list == NULL) + return NULL; + /* * Switch to using hash lookup when list grows "too long". The threshold * is arbitrary and is known only here. */ - if (!root->join_rel_hash && list_length(root->join_rel_list) > 32) - build_join_rel_hash(root); + if (!list->hash && list_length(list->items) > 32) + build_rel_hash(list); /* * Use either hashtable lookup or linear search, as appropriate. @@ -541,23 +547,23 @@ find_join_rel(PlannerInfo *root, Relids relids) * so would force relids out of a register and thus probably slow down the * list-search case. */ - if (root->join_rel_hash) + if (list->hash) { Relids hashkey = relids; - JoinHashEntry *hentry; + RelInfoEntry *hentry; - hentry = (JoinHashEntry *) hash_search(root->join_rel_hash, - &hashkey, - HASH_FIND, - NULL); + hentry = (RelInfoEntry *) hash_search(list->hash, + &hashkey, + HASH_FIND, + NULL); if (hentry) - return hentry->join_rel; + return hentry->rel; } else { ListCell *l; - foreach(l, root->join_rel_list) + foreach(l, list->items) { RelOptInfo *rel = (RelOptInfo *) lfirst(l); @@ -569,6 +575,54 @@ find_join_rel(PlannerInfo *root, Relids relids) return NULL; } +/* + * find_join_rel + * Returns relation entry corresponding to 'relids' (a set of RT indexes), + * or NULL if none exists. This is for join relations. + */ +RelOptInfo * +find_join_rel(PlannerInfo *root, Relids relids) +{ + return find_rel_info(root->join_rel_list, relids); +} + +/* + * add_rel_info + * Add given relation to the given list. Also add it to the auxiliary + * hashtable if there is one. + */ +static void +add_rel_info(RelInfoList *list, RelOptInfo *rel) +{ + /* GEQO requires us to append the new relation to the end of the list! */ + list->items = lappend(list->items, rel); + + /* store it into the auxiliary hashtable if there is one. */ + if (list->hash) + { + RelInfoEntry *hentry; + bool found; + + hentry = (RelInfoEntry *) hash_search(list->hash, + &(rel->relids), + HASH_ENTER, + &found); + Assert(!found); + hentry->rel = rel; + } +} + +/* + * add_join_rel + * Add given join relation to the list of join relations in the given + * PlannerInfo. + */ +static void +add_join_rel(PlannerInfo *root, RelOptInfo *joinrel) +{ + add_rel_info(root->join_rel_list, joinrel); +} + /* * set_foreign_rel_properties * Set up foreign-join fields if outer and inner relation are foreign @@ -618,32 +672,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel, } } -/* - * add_join_rel - * Add given join relation to the list of join relations in the given - * PlannerInfo. Also add it to the auxiliary hashtable if there is one. - */ -static void -add_join_rel(PlannerInfo *root, RelOptInfo *joinrel) -{ - /* GEQO requires us to append the new joinrel to the end of the list! */ - root->join_rel_list = lappend(root->join_rel_list, joinrel); - - /* store it into the auxiliary hashtable if there is one. */ - if (root->join_rel_hash) - { - JoinHashEntry *hentry; - bool found; - - hentry = (JoinHashEntry *) hash_search(root->join_rel_hash, - &(joinrel->relids), - HASH_ENTER, - &found); - Assert(!found); - hentry->join_rel = joinrel; - } -} - /* * build_join_rel * Returns relation entry corresponding to the union of two given rels, @@ -1469,22 +1497,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel, RelOptInfo * fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids) { + RelInfoList *list = &root->upper_rels[kind]; RelOptInfo *upperrel; - ListCell *lc; - - /* - * For the moment, our indexing data structure is just a List for each - * relation kind. If we ever get so many of one kind that this stops - * working well, we can improve it. No code outside this function should - * assume anything about how to find a particular upperrel. - */ /* If we already made this upperrel for the query, return it */ - foreach(lc, root->upper_rels[kind]) + if (list) { - upperrel = (RelOptInfo *) lfirst(lc); - - if (bms_equal(upperrel->relids, relids)) + upperrel = find_rel_info(list, relids); + if (upperrel) return upperrel; } @@ -1503,7 +1523,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids) upperrel->cheapest_unique_path = NULL; upperrel->cheapest_parameterized_paths = NIL; - root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel); + add_rel_info(&root->upper_rels[kind], upperrel); return upperrel; } diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 2ba297c117..0805de64d5 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -80,6 +80,25 @@ typedef enum UpperRelationKind /* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */ } UpperRelationKind; +/* + * Hashed list to store relation specific info and to retrieve it by relids. + * + * For small problems we just scan the list to do lookups, but when there are + * many relations we build a hash table for faster lookups. The hash table is + * present and valid when 'hash' is not NULL. Note that we still maintain the + * list even when using the hash table for lookups; this simplifies life for + * GEQO. + */ +typedef struct RelInfoList +{ + pg_node_attr(no_copy_equal, no_read) + + NodeTag type; + + List *items; + struct HTAB *hash pg_node_attr(read_write_ignore); +} RelInfoList; + /*---------- * PlannerGlobal * Global information for planning/optimization @@ -270,15 +289,9 @@ struct PlannerInfo /* * join_rel_list is a list of all join-relation RelOptInfos we have - * considered in this planning run. For small problems we just scan the - * list to do lookups, but when there are many join relations we build a - * hash table for faster lookups. The hash table is present and valid - * when join_rel_hash is not NULL. Note that we still maintain the list - * even when using the hash table for lookups; this simplifies life for - * GEQO. + * considered in this planning run. */ - List *join_rel_list; - struct HTAB *join_rel_hash pg_node_attr(read_write_ignore); + RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */ /* * When doing a dynamic-programming-style join search, join_rel_level[k] @@ -413,7 +426,7 @@ struct PlannerInfo * Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular * upper rel. */ - List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); + RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); /* Result tlists chosen by grouping_planner for upper-stage processing */ struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); -- 2.43.0 [application/octet-stream] v9-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patch (7.8K, 3-v9-0002-Introduce-RelAggInfo-structure-to-store-info-for-grouped-paths.patch) download | inline diff: From 06e29a2206817a810baa4f9155f2d0732885a0f9 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 16:01:26 +0900 Subject: [PATCH v9 02/10] Introduce RelAggInfo structure to store info for grouped paths This commit introduces RelAggInfo structure to store information needed to create grouped paths for base and join rels. It also revises the RelInfoList related structures and functions so that they can be used with RelAggInfos. --- src/backend/optimizer/util/relnode.c | 66 +++++++++++++++++-------- src/include/nodes/pathnodes.h | 73 ++++++++++++++++++++++++++++ 2 files changed, 118 insertions(+), 21 deletions(-) diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 8279ab0e11..8420b8936e 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -36,13 +36,13 @@ /* - * An entry of a hash table that we use to make lookup for RelOptInfo - * structures more efficient. + * An entry of a hash table that we use to make lookup for RelOptInfo or + * RelAggInfo structures more efficient. */ typedef struct RelInfoEntry { Relids relids; /* hash key --- MUST BE FIRST */ - RelOptInfo *rel; + void *data; } RelInfoEntry; static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, @@ -484,7 +484,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid) /* * build_rel_hash - * Construct the auxiliary hash table for relations. + * Construct the auxiliary hash table for relation specific data. */ static void build_rel_hash(RelInfoList *list) @@ -504,19 +504,27 @@ build_rel_hash(RelInfoList *list) &hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); - /* Insert all the already-existing relations */ + /* Insert all the already-existing relation specific infos */ foreach(l, list->items) { - RelOptInfo *rel = (RelOptInfo *) lfirst(l); + void *item = lfirst(l); RelInfoEntry *hentry; bool found; + Relids relids; + + Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo)); + + if (IsA(item, RelOptInfo)) + relids = ((RelOptInfo *) item)->relids; + else + relids = ((RelAggInfo *) item)->relids; hentry = (RelInfoEntry *) hash_search(hashtab, - &(rel->relids), + &relids, HASH_ENTER, &found); Assert(!found); - hentry->rel = rel; + hentry->data = item; } list->hash = hashtab; @@ -524,9 +532,9 @@ build_rel_hash(RelInfoList *list) /* * find_rel_info - * Find an RelOptInfo entry. + * Find an RelOptInfo or a RelAggInfo entry. */ -static RelOptInfo * +static void * find_rel_info(RelInfoList *list, Relids relids) { if (list == NULL) @@ -557,7 +565,7 @@ find_rel_info(RelInfoList *list, Relids relids) HASH_FIND, NULL); if (hentry) - return hentry->rel; + return hentry->data; } else { @@ -565,10 +573,18 @@ find_rel_info(RelInfoList *list, Relids relids) foreach(l, list->items) { - RelOptInfo *rel = (RelOptInfo *) lfirst(l); + void *item = lfirst(l); + Relids item_relids = NULL; + + Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo)); - if (bms_equal(rel->relids, relids)) - return rel; + if (IsA(item, RelOptInfo)) + item_relids = ((RelOptInfo *) item)->relids; + else if (IsA(item, RelAggInfo)) + item_relids = ((RelAggInfo *) item)->relids; + + if (bms_equal(item_relids, relids)) + return item; } } @@ -583,32 +599,40 @@ find_rel_info(RelInfoList *list, Relids relids) RelOptInfo * find_join_rel(PlannerInfo *root, Relids relids) { - return find_rel_info(root->join_rel_list, relids); + return (RelOptInfo *) find_rel_info(root->join_rel_list, relids); } /* * add_rel_info - * Add given relation to the given list. Also add it to the auxiliary + * Add relation specific info to a list, and also add it to the auxiliary * hashtable if there is one. */ static void -add_rel_info(RelInfoList *list, RelOptInfo *rel) +add_rel_info(RelInfoList *list, void *data) { + Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo)); + /* GEQO requires us to append the new relation to the end of the list! */ - list->items = lappend(list->items, rel); + list->items = lappend(list->items, data); /* store it into the auxiliary hashtable if there is one. */ if (list->hash) { + Relids relids; RelInfoEntry *hentry; bool found; + if (IsA(data, RelOptInfo)) + relids = ((RelOptInfo *) data)->relids; + else + relids = ((RelAggInfo *) data)->relids; + hentry = (RelInfoEntry *) hash_search(list->hash, - &(rel->relids), + &relids, HASH_ENTER, &found); Assert(!found); - hentry->rel = rel; + hentry->data = data; } } @@ -1503,7 +1527,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids) /* If we already made this upperrel for the query, return it */ if (list) { - upperrel = find_rel_info(list, relids); + upperrel = (RelOptInfo *) find_rel_info(list, relids); if (upperrel) return upperrel; } diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 0805de64d5..18d1ae8cbc 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -1078,6 +1078,79 @@ typedef struct RelOptInfo ((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \ (rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs) +/* + * RelAggInfo + * Information needed to create grouped paths for base and join rels. + * + * "relids" is the set of relation identifiers (RT indexes), just like with + * RelOptInfo. + * + * "target" will be used as pathtarget if partial aggregation is applied to + * base relation or join. The same target will also --- if the relation is a + * join --- be used to join grouped path to a non-grouped one. This target can + * contain plain-Var grouping expressions and Aggref nodes. + * + * Note: There's a convention that Aggref expressions are supposed to follow + * the other expressions of the target. Iterations of ->exprs may rely on this + * arrangement. + * + * "agg_input" contains Vars used either as grouping expressions or aggregate + * arguments. Paths providing the aggregation plan with input data should use + * this target. The only difference from reltarget of the non-grouped relation + * is that some items can have sortgroupref initialized. + * + * "input_rows" is the estimated number of input rows for AggPath. It's + * actually just a workspace for users of the structure, i.e. not initialized + * when instance of the structure is created. + * + * "grouped_rows" is the estimated number of result rows of the AggPath. + * + * "group_clauses", "group_exprs" and "group_pathkeys" are lists of + * SortGroupClause, the corresponding grouping expressions and PathKey + * respectively. + * + * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's + * paths. + */ +typedef struct RelAggInfo +{ + pg_node_attr(no_copy_equal, no_read, no_query_jumble) + + NodeTag type; + + /* + * the same as in RelOptInfo; set of base + OJ relids (rangetable indexes) + */ + Relids relids; + + /* + * the targetlist for Paths scanning this grouped rel; list of Vars/Exprs, + * cost, width + */ + struct PathTarget *target; + + /* + * the targetlist for Paths that generate input for the grouped paths + */ + struct PathTarget *agg_input; + + /* estimated number of input tuples for the grouped paths */ + Cardinality input_rows; + + /* estimated number of result tuples of the grouped relation*/ + Cardinality grouped_rows; + + /* a list of SortGroupClause's */ + List *group_clauses; + /* a list of grouping expressions */ + List *group_exprs; + /* a list of PathKeys */ + List *group_pathkeys; + + /* a list of Aggref nodes */ + List *agg_exprs; +} RelAggInfo; + /* * IndexOptInfo * Per-index information for planning/optimization -- 2.43.0 [application/octet-stream] v9-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patch (14.4K, 4-v9-0003-Set-up-for-eager-aggregation-by-collecting-needed-infos.patch) download | inline diff: From 5fc05389ba521c3ee73edda770fab39756f298be Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 16:03:00 +0900 Subject: [PATCH v9 03/10] Set up for eager aggregation by collecting needed infos This commit checks if eager aggregation is applicable, and if so, sets up root->agg_clause_list and root->group_expr_list by collecting suitable aggregate expressions and grouping expressions in the query. --- src/backend/optimizer/path/allpaths.c | 1 + src/backend/optimizer/plan/initsplan.c | 250 ++++++++++++++++++ src/backend/optimizer/plan/planmain.c | 8 + src/backend/utils/misc/guc_tables.c | 10 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/nodes/pathnodes.h | 41 +++ src/include/optimizer/paths.h | 1 + src/include/optimizer/planmain.h | 1 + src/test/regress/expected/sysviews.out | 3 +- 9 files changed, 315 insertions(+), 1 deletion(-) diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 70e2b58d8f..d1b974367b 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -77,6 +77,7 @@ typedef enum pushdown_safe_type /* These parameters are set by GUC */ bool enable_geqo = false; /* just in case GUC doesn't set it */ +bool enable_eager_aggregate = false; int geqo_threshold; int min_parallel_table_scan_size; int min_parallel_index_scan_size; diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index e2c68fe6f9..4e51213410 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -14,6 +14,7 @@ */ #include "postgres.h" +#include "access/nbtree.h" #include "catalog/pg_type.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" @@ -80,6 +81,8 @@ typedef struct JoinTreeItem } JoinTreeItem; +static void create_agg_clause_infos(PlannerInfo *root); +static void create_grouping_expr_infos(PlannerInfo *root); static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel, Index rtindex); static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode, @@ -327,6 +330,253 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars, } } +/* + * setup_eager_aggregation + * Check if eager aggregation is applicable, and if so collect suitable + * aggregate expressions and grouping expressions in the query. + */ +void +setup_eager_aggregation(PlannerInfo *root) +{ + /* + * Don't apply eager aggregation if disabled by user. + */ + if (!enable_eager_aggregate) + return; + + /* + * Don't apply eager aggregation if there are no GROUP BY clauses. + */ + if (!root->parse->groupClause) + return; + + /* + * For now we don't try to support grouping sets. + */ + if (root->parse->groupingSets) + return; + + /* + * For now we don't try to support DISTINCT or ORDER BY aggregates. + */ + if (root->numOrderedAggs > 0) + return; + + /* + * If there are any aggregates that do not support partial mode, or any + * partial aggregates that are non-serializable, do not apply eager + * aggregation. + */ + if (root->hasNonPartialAggs || root->hasNonSerialAggs) + return; + + /* + * SRF is not allowed in the aggregate argument and we don't even want it + * in the GROUP BY clause, so forbid it in general. It needs to be + * analyzed if evaluation of a GROUP BY clause containing SRF below the + * query targetlist would be correct. Currently it does not seem to be an + * important use case. + */ + if (root->parse->hasTargetSRFs) + return; + + /* + * Collect aggregate expressions that appear in targetlist and having + * clauses. + */ + create_agg_clause_infos(root); + + /* + * If there are no suitable aggregate expressions, we cannot apply eager + * aggregation. + */ + if (root->agg_clause_list == NIL) + return; + + /* + * Collect grouping expressions that appear in grouping clauses. + */ + create_grouping_expr_infos(root); +} + +/* + * Create AggClauseInfo for each aggregate. + * + * If any aggregate is not suitable, set root->agg_clause_list to NIL and + * return. + */ +static void +create_agg_clause_infos(PlannerInfo *root) +{ + List *tlist_exprs; + ListCell *lc; + + Assert(root->agg_clause_list == NIL); + + tlist_exprs = pull_var_clause((Node *) root->processed_tlist, + PVC_INCLUDE_AGGREGATES | + PVC_RECURSE_WINDOWFUNCS | + PVC_RECURSE_PLACEHOLDERS); + + /* + * For now we don't try to support GROUPING() expressions. + */ + foreach(lc, tlist_exprs) + { + Expr *expr = (Expr *) lfirst(lc); + + if (IsA(expr, GroupingFunc)) + return; + } + + /* + * Aggregates within the HAVING clause need to be processed in the same way + * as those in the targetlist. Note that HAVING can contain Aggrefs but + * not WindowFuncs. + */ + if (root->parse->havingQual != NULL) + { + List *having_exprs; + + having_exprs = pull_var_clause((Node *) root->parse->havingQual, + PVC_INCLUDE_AGGREGATES | + PVC_RECURSE_PLACEHOLDERS); + if (having_exprs != NIL) + { + tlist_exprs = list_concat(tlist_exprs, having_exprs); + list_free(having_exprs); + } + } + + foreach(lc, tlist_exprs) + { + Expr *expr = (Expr *) lfirst(lc); + Aggref *aggref; + AggClauseInfo *ac_info; + + /* + * tlist_exprs may also contain Vars, but we only need Aggrefs. + */ + if (IsA(expr, Var)) + continue; + + aggref = castNode(Aggref, expr); + + Assert(aggref->aggorder == NIL); + Assert(aggref->aggdistinct == NIL); + + ac_info = makeNode(AggClauseInfo); + ac_info->aggref = aggref; + ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref); + + root->agg_clause_list = + list_append_unique(root->agg_clause_list, ac_info); + } + + list_free(tlist_exprs); +} + +/* + * Create GroupExprInfo for each expression usable as grouping key. + * + * If any grouping expression is not suitable, set root->group_expr_list to NIL + * and return. + */ +static void +create_grouping_expr_infos(PlannerInfo *root) +{ + List *exprs = NIL; + List *sortgrouprefs = NIL; + List *btree_opfamilies = NIL; + ListCell *lc, + *lc1, + *lc2, + *lc3; + + Assert(root->group_expr_list == NIL); + + foreach(lc, root->processed_groupClause) + { + SortGroupClause *sgc = lfirst_node(SortGroupClause, lc); + TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist); + TypeCacheEntry *tce; + Oid equalimageproc; + Oid eq_op; + List *eq_opfamilies; + Oid btree_opfamily; + + Assert(tle->ressortgroupref > 0); + + /* + * For now we only support plain Vars as grouping expressions. + */ + if (!IsA(tle->expr, Var)) + return; + + /* + * Eager aggregation is only possible if equality of grouping keys + * per the equality operator implies bitwise equality. Otherwise, if + * we put keys of different byte images into the same group, we lose + * some information that may be needed to evaluate join clauses above + * the pushed-down aggregate node, or the WHERE clause. + * + * For example, the NUMERIC data type is not supported because values + * that fall into the same group according to the equality operator + * (e.g. 0 and 0.0) can have different scale. + */ + tce = lookup_type_cache(exprType((Node *) tle->expr), + TYPECACHE_BTREE_OPFAMILY); + if (!OidIsValid(tce->btree_opf) || + !OidIsValid(tce->btree_opintype)) + return; + + equalimageproc = get_opfamily_proc(tce->btree_opf, + tce->btree_opintype, + tce->btree_opintype, + BTEQUALIMAGE_PROC); + if (!OidIsValid(equalimageproc) || + !DatumGetBool(OidFunctionCall1Coll(equalimageproc, + tce->typcollation, + ObjectIdGetDatum(tce->btree_opintype)))) + return; + + /* + * Get the operator in the btree's opfamily. + */ + eq_op = get_opfamily_member(tce->btree_opf, + tce->btree_opintype, + tce->btree_opintype, + BTEqualStrategyNumber); + if (!OidIsValid(eq_op)) + return; + eq_opfamilies = get_mergejoin_opfamilies(eq_op); + if (!eq_opfamilies) + return; + btree_opfamily = linitial_oid(eq_opfamilies); + + exprs = lappend(exprs, tle->expr); + sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref); + btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily); + } + + /* + * Construct GroupExprInfo for each expression. + */ + forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies) + { + Expr *expr = (Expr *) lfirst(lc1); + int sortgroupref = lfirst_int(lc2); + Oid btree_opfamily = lfirst_oid(lc3); + GroupExprInfo *ge_info; + + ge_info = makeNode(GroupExprInfo); + ge_info->expr = (Expr *) copyObject(expr); + ge_info->sortgroupref = sortgroupref; + ge_info->btree_opfamily = btree_opfamily; + + root->group_expr_list = lappend(root->group_expr_list, ge_info); + } +} /***************************************************************************** * diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index fd8b2b0ca3..5d2bca914b 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -77,6 +77,8 @@ query_planner(PlannerInfo *root, root->placeholder_list = NIL; root->placeholder_array = NULL; root->placeholder_array_size = 0; + root->agg_clause_list = NIL; + root->group_expr_list = NIL; root->fkey_list = NIL; root->initial_rels = NIL; @@ -258,6 +260,12 @@ query_planner(PlannerInfo *root, */ extract_restriction_or_clauses(root); + /* + * Check if eager aggregation is applicable, and if so, set up + * root->agg_clause_list and root->group_expr_list. + */ + setup_eager_aggregation(root); + /* * Now expand appendrels by adding "otherrels" for their children. We * delay this to the end so that we have as much information as possible diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index d28b0bcb40..d3b37af81a 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -931,6 +931,16 @@ struct config_bool ConfigureNamesBool[] = false, NULL, NULL, NULL }, + { + {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD, + gettext_noop("Enables eager aggregation."), + NULL, + GUC_EXPLAIN + }, + &enable_eager_aggregate, + false, + NULL, NULL, NULL + }, { {"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD, gettext_noop("Enables the planner's use of parallel append plans."), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 9ec9f97e92..3f03ef0438 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -413,6 +413,7 @@ #enable_sort = on #enable_tidscan = on #enable_group_by_reordering = on +#enable_eager_aggregate = off # - Planner Cost Constants - diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 18d1ae8cbc..683ab51e6b 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -386,6 +386,12 @@ struct PlannerInfo /* list of PlaceHolderInfos */ List *placeholder_list; + /* list of AggClauseInfos */ + List *agg_clause_list; + + /* List of GroupExprInfos */ + List *group_expr_list; + /* array of PlaceHolderInfos indexed by phid */ struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size)); /* allocated size of array */ @@ -3219,6 +3225,41 @@ typedef struct MinMaxAggInfo Param *param; } MinMaxAggInfo; +/* + * The aggregate expressions that appear in targetlist and having clauses + */ +typedef struct AggClauseInfo +{ + pg_node_attr(no_read, no_query_jumble) + + NodeTag type; + + /* the Aggref expr */ + Aggref *aggref; + + /* lowest level we can evaluate this aggregate at */ + Relids agg_eval_at; +} AggClauseInfo; + +/* + * The grouping expressions that appear in grouping clauses + */ +typedef struct GroupExprInfo +{ + pg_node_attr(no_read, no_query_jumble) + + NodeTag type; + + /* the represented expression */ + Expr *expr; + + /* the tleSortGroupRef of the corresponding SortGroupClause */ + Index sortgroupref; + + /* btree opfamily defining the ordering */ + Oid btree_opfamily; +} GroupExprInfo; + /* * At runtime, PARAM_EXEC slots are used to pass values around from one plan * node to another. They can be used to pass values down into subqueries (for diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index 5e88c0224a..d8199333c9 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -21,6 +21,7 @@ * allpaths.c */ extern PGDLLIMPORT bool enable_geqo; +extern PGDLLIMPORT bool enable_eager_aggregate; extern PGDLLIMPORT int geqo_threshold; extern PGDLLIMPORT int min_parallel_table_scan_size; extern PGDLLIMPORT int min_parallel_index_scan_size; diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index aafc173792..cedcd88ebf 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -72,6 +72,7 @@ extern void add_other_rels_to_query(PlannerInfo *root); extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist); extern void add_vars_to_targetlist(PlannerInfo *root, List *vars, Relids where_needed); +extern void setup_eager_aggregation(PlannerInfo *root); extern void find_lateral_references(PlannerInfo *root); extern void create_lateral_join_info(PlannerInfo *root); extern List *deconstruct_jointree(PlannerInfo *root); diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out index 729620de13..46d6645bd8 100644 --- a/src/test/regress/expected/sysviews.out +++ b/src/test/regress/expected/sysviews.out @@ -136,6 +136,7 @@ select name, setting from pg_settings where name like 'enable%'; --------------------------------+--------- enable_async_append | on enable_bitmapscan | on + enable_eager_aggregate | off enable_gathermerge | on enable_group_by_reordering | on enable_hashagg | on @@ -156,7 +157,7 @@ select name, setting from pg_settings where name like 'enable%'; enable_seqscan | on enable_sort | on enable_tidscan | on -(22 rows) +(23 rows) -- There are always wait event descriptions for various types. InjectionPoint -- may be present or absent, depending on history since last postmaster start. -- 2.43.0 [application/octet-stream] v9-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patch (29.7K, 5-v9-0004-Implement-functions-that-create-RelAggInfos-if-applicable.patch) download | inline diff: From 60437198177719782d13c60cd0309406045e707b Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 16:04:41 +0900 Subject: [PATCH v9 04/10] Implement functions that create RelAggInfos if applicable This commit implements the functions that check if eager aggregation is applicable for a given relation, and if so, create RelAggInfo structure for the relation, using the infos about aggregate expressions and grouping expressions we collected earlier. --- src/backend/optimizer/path/equivclass.c | 26 +- src/backend/optimizer/plan/initsplan.c | 24 +- src/backend/optimizer/plan/planmain.c | 4 + src/backend/optimizer/util/relnode.c | 647 ++++++++++++++++++++++++ src/backend/utils/adt/selfuncs.c | 5 +- src/include/nodes/pathnodes.h | 11 +- src/include/optimizer/pathnode.h | 5 + src/include/optimizer/paths.h | 3 +- 8 files changed, 704 insertions(+), 21 deletions(-) diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 51d806326e..d871396e20 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -2443,15 +2443,17 @@ find_join_domain(PlannerInfo *root, Relids relids) * Detect whether two expressions are known equal due to equivalence * relationships. * - * Actually, this only shows that the expressions are equal according - * to some opfamily's notion of equality --- but we only use it for - * selectivity estimation, so a fuzzy idea of equality is OK. + * If opfamily is given, the expressions must be known equal per the semantics + * of that opfamily (note it has to be a btree opfamily, since those are the + * only opfamilies equivclass.c deals with). If opfamily is InvalidOid, we'll + * return true if they're equal according to any opfamily, which is fuzzy but + * OK for estimation purposes. * * Note: does not bother to check for "equal(item1, item2)"; caller must * check that case if it's possible to pass identical items. */ bool -exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2) +exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, Oid opfamily) { ListCell *lc1; @@ -2466,6 +2468,17 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2) if (ec->ec_has_volatile) continue; + /* + * It's okay to consider ec_broken ECs here. Brokenness just means we + * couldn't derive all the implied clauses we'd have liked to; it does + * not invalidate our knowledge that the members are equal. + */ + + /* Ignore if this EC doesn't use specified opfamily */ + if (OidIsValid(opfamily) && + !list_member_oid(ec->ec_opfamilies, opfamily)) + continue; + foreach(lc2, ec->ec_members) { EquivalenceMember *em = (EquivalenceMember *) lfirst(lc2); @@ -2494,8 +2507,7 @@ exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2) * (In principle there might be more than one matching eclass if multiple * collations are involved, but since collation doesn't matter for equality, * we ignore that fine point here.) This is much like exprs_known_equal, - * except that we insist on the comparison operator matching the eclass, so - * that the result is definite not approximate. + * except for the format of the input. * * On success, we also set fkinfo->eclass[colno] to the matching eclass, * and set fkinfo->fk_eclass_member[colno] to the eclass member for the @@ -2536,7 +2548,7 @@ match_eclasses_to_foreign_key_col(PlannerInfo *root, /* Never match to a volatile EC */ if (ec->ec_has_volatile) continue; - /* Note: it seems okay to match to "broken" eclasses here */ + /* It's okay to consider "broken" ECs here, see exprs_known_equal */ foreach(lc2, ec->ec_members) { diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 4e51213410..9f05edfbac 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -381,8 +381,8 @@ setup_eager_aggregation(PlannerInfo *root) return; /* - * Collect aggregate expressions that appear in targetlist and having - * clauses. + * Collect aggregate expressions and plain Vars that appear in targetlist + * and having clauses. */ create_agg_clause_infos(root); @@ -400,10 +400,9 @@ setup_eager_aggregation(PlannerInfo *root) } /* - * Create AggClauseInfo for each aggregate. - * - * If any aggregate is not suitable, set root->agg_clause_list to NIL and - * return. + * create_agg_clause_infos + * Search the targetlist and havingQual for Aggrefs and plain Vars, and + * create an AggClauseInfo for each Aggref node. */ static void create_agg_clause_infos(PlannerInfo *root) @@ -412,6 +411,7 @@ create_agg_clause_infos(PlannerInfo *root) ListCell *lc; Assert(root->agg_clause_list == NIL); + Assert(root->tlist_vars == NIL); tlist_exprs = pull_var_clause((Node *) root->processed_tlist, PVC_INCLUDE_AGGREGATES | @@ -455,10 +455,13 @@ create_agg_clause_infos(PlannerInfo *root) AggClauseInfo *ac_info; /* - * tlist_exprs may also contain Vars, but we only need Aggrefs. + * collect plain Vars for future reference */ if (IsA(expr, Var)) + { + root->tlist_vars = list_append_unique(root->tlist_vars, expr); continue; + } aggref = castNode(Aggref, expr); @@ -477,10 +480,11 @@ create_agg_clause_infos(PlannerInfo *root) } /* - * Create GroupExprInfo for each expression usable as grouping key. + * create_grouping_expr_infos + * Create GroupExprInfo for each expression usable as grouping key. * - * If any grouping expression is not suitable, set root->group_expr_list to NIL - * and return. + * If any grouping expression is not suitable, we will just return with + * root->group_expr_list being NIL. */ static void create_grouping_expr_infos(PlannerInfo *root) diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index 5d2bca914b..ece6936e23 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -67,6 +67,9 @@ query_planner(PlannerInfo *root, root->join_rel_list = makeNode(RelInfoList); root->join_rel_list->items = NIL; root->join_rel_list->hash = NULL; + root->agg_info_list = makeNode(RelInfoList); + root->agg_info_list->items = NIL; + root->agg_info_list->hash = NULL; root->join_rel_level = NULL; root->join_cur_level = 0; root->canon_pathkeys = NIL; @@ -79,6 +82,7 @@ query_planner(PlannerInfo *root, root->placeholder_array_size = 0; root->agg_clause_list = NIL; root->group_expr_list = NIL; + root->tlist_vars = NIL; root->fkey_list = NIL; root->initial_rels = NIL; diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 8420b8936e..27f779d778 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -87,6 +87,15 @@ static void build_child_join_reltarget(PlannerInfo *root, RelOptInfo *childrel, int nappinfos, AppendRelInfo **appinfos); +static bool eager_aggregation_possible_for_relation(PlannerInfo *root, + RelOptInfo *rel); +static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel, + PathTarget *target, PathTarget *agg_input, + List **group_exprs_extra_p); +static bool is_var_in_aggref_only(PlannerInfo *root, Var *var); +static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel, + bool *safe_to_push); +static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr); /* @@ -647,6 +656,58 @@ add_join_rel(PlannerInfo *root, RelOptInfo *joinrel) add_rel_info(root->join_rel_list, joinrel); } +/* + * add_grouped_rel + * Add grouped base or join relation to the list of grouped relations in + * the given PlannerInfo. Also add the corresponding RelAggInfo to + * root->agg_info_list. + */ +void +add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info) +{ + add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel); + add_rel_info(root->agg_info_list, agg_info); +} + +/* + * find_grouped_rel + * Returns grouped relation entry (base or join relation) corresponding to + * 'relids' or NULL if none exists. + * + * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one + * exists) will be returned in *agg_info_p. + */ +RelOptInfo * +find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p) +{ + RelOptInfo *rel; + + rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], + relids); + if (rel == NULL) + { + if (agg_info_p) + *agg_info_p = NULL; + + return NULL; + } + + /* also return the corresponding RelAggInfo, if asked */ + if (agg_info_p) + { + RelAggInfo *agg_info; + + agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids); + + /* The relation exists, so the agg_info should be there too. */ + Assert(agg_info != NULL); + + *agg_info_p = agg_info; + } + + return rel; +} + /* * set_foreign_rel_properties * Set up foreign-join fields if outer and inner relation are foreign @@ -2483,3 +2544,589 @@ build_child_join_reltarget(PlannerInfo *root, childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple; childrel->reltarget->width = parentrel->reltarget->width; } + +/* + * create_rel_agg_info + * Check if the given relation can produce grouped paths and return the + * information it'll need for it. The given relation is the non-grouped one + * which has the reltarget already constructed. + */ +RelAggInfo * +create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel) +{ + ListCell *lc; + RelAggInfo *result; + PathTarget *agg_input; + PathTarget *target; + List *grp_exprs_extra = NIL; + List *group_clauses_final; + int i; + + /* + * The lists of aggregate expressions and grouping expressions should have + * been constructed. + */ + Assert(root->agg_clause_list != NIL); + Assert(root->group_expr_list != NIL); + + /* + * If this is a child rel, the grouped rel for its parent rel must have + * been created if it can. So we can just use parent's RelAggInfo if there + * is one, with appropriate variable substitutions. + */ + if (IS_OTHER_REL(rel)) + { + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + Assert(!bms_is_empty(rel->top_parent_relids)); + rel_grouped = find_grouped_rel(root, rel->top_parent_relids, &agg_info); + + if (rel_grouped == NULL) + return NULL; + + Assert(agg_info != NULL); + /* Must do multi-level transformation */ + agg_info = (RelAggInfo *) + adjust_appendrel_attrs_multilevel(root, + (Node *) agg_info, + rel, + rel->top_parent); + + agg_info->input_rows = rel->rows; + agg_info->grouped_rows = + estimate_num_groups(root, agg_info->group_exprs, + agg_info->input_rows, NULL, NULL); + + return agg_info; + } + + /* Check if it's possible to produce grouped paths for this relation. */ + if (!eager_aggregation_possible_for_relation(root, rel)) + return NULL; + + /* + * Create targets for the grouped paths and for the input paths of the + * grouped paths. + */ + target = create_empty_pathtarget(); + agg_input = create_empty_pathtarget(); + + /* initialize 'target' and 'agg_input' */ + if (!init_grouping_targets(root, rel, target, agg_input, &grp_exprs_extra)) + return NULL; + + /* Eager aggregation makes no sense w/o grouping expressions */ + if ((list_length(target->exprs) + list_length(grp_exprs_extra)) == 0) + return NULL; + + group_clauses_final = root->parse->groupClause; + + /* + * If the aggregation target should have extra grouping expressions (in + * order to emit input vars for join conditions), add them now. This step + * includes assignment of tleSortGroupRef's which we can generate now. + */ + if (list_length(grp_exprs_extra) > 0) + { + Index sortgroupref; + + /* + * Make a copy of the group clauses as we'll need to add some more + * clauses. + */ + group_clauses_final = list_copy(group_clauses_final); + + /* find out the current max sortgroupref */ + sortgroupref = 0; + foreach(lc, root->processed_tlist) + { + Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref; + + if (ref > sortgroupref) + sortgroupref = ref; + } + + /* + * Generate the SortGroupClause's and add the expressions to the + * target. + */ + foreach(lc, grp_exprs_extra) + { + Var *var = lfirst_node(Var, lc); + SortGroupClause *cl = makeNode(SortGroupClause); + + /* + * Initialize the SortGroupClause. + * + * As the final aggregation will not use this grouping expression, + * we don't care whether sortop is < or >. The value of nulls_first + * should not matter for the same reason. + */ + cl->tleSortGroupRef = ++sortgroupref; + get_sort_group_operators(var->vartype, + false, true, false, + &cl->sortop, &cl->eqop, NULL, + &cl->hashable); + group_clauses_final = lappend(group_clauses_final, cl); + add_column_to_pathtarget(target, (Expr *) var, + cl->tleSortGroupRef); + + /* + * The aggregation input target must emit this var too. + */ + add_column_to_pathtarget(agg_input, (Expr *) var, + cl->tleSortGroupRef); + } + } + + /* + * Build a list of grouping expressions and a list of the corresponding + * SortGroupClauses. + */ + i = 0; + result = makeNode(RelAggInfo); + foreach(lc, target->exprs) + { + Index sortgroupref = 0; + SortGroupClause *cl; + Expr *texpr; + + texpr = (Expr *) lfirst(lc); + + Assert(IsA(texpr, Var)); + + sortgroupref = target->sortgrouprefs[i++]; + if (sortgroupref == 0) + continue; + + /* find the SortGroupClause in group_clauses_final */ + cl = get_sortgroupref_clause(sortgroupref, group_clauses_final); + + /* do not add this SortGroupClause if it has already been added */ + if (list_member(result->group_clauses, cl)) + continue; + + result->group_clauses = lappend(result->group_clauses, cl); + result->group_exprs = list_append_unique(result->group_exprs, + texpr); + } + + /* + * Calculate pathkeys that represent this grouping requirements. + */ + result->group_pathkeys = + make_pathkeys_for_sortclauses(root, result->group_clauses, + make_tlist_from_pathtarget(target)); + + /* + * Add aggregates to the grouping target. + */ + foreach(lc, root->agg_clause_list) + { + AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); + Aggref *aggref; + + Assert(IsA(ac_info->aggref, Aggref)); + + aggref = (Aggref *) copyObject(ac_info->aggref); + mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL); + + add_column_to_pathtarget(target, (Expr *) aggref, 0); + + result->agg_exprs = lappend(result->agg_exprs, aggref); + } + + /* + * Since neither target nor agg_input is supposed to be identical to the + * source reltarget, compute the width and cost again. + */ + set_pathtarget_cost_width(root, target); + set_pathtarget_cost_width(root, agg_input); + + result->relids = bms_copy(rel->relids); + result->target = target; + result->agg_input = agg_input; + + /* + * The number of aggregation input rows is simply the number of rows of the + * non-grouped relation, which should have been estimated by now. + */ + result->input_rows = rel->rows; + + /* Estimate the number of groups with equal grouped exprs. */ + result->grouped_rows = estimate_num_groups(root, result->group_exprs, + result->input_rows, NULL, NULL); + + return result; +} + +/* + * eager_aggregation_possible_for_relation + * Check if it's possible to produce grouped paths for the given relation. + */ +static bool +eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel) +{ + ListCell *lc; + + /* + * The current implementation of eager aggregation cannot handle + * PlaceHolderVar (PHV). + * + * If we knew that the PHV should be evaluated in this target (and of + * course, if its expression matched some Aggref argument), we'd just let + * init_grouping_targets add that Aggref. On the other hand, if we knew + * that the PHV is evaluated below the current rel, we could ignore it + * because the referencing Aggref would take care of propagation of the + * value to upper joins. + * + * The problem is that the same PHV can be evaluated in the target of the + * current rel or in that of lower rel --- depending on the input paths. + * For example, consider rel->relids = {A, B, C} and if ph_eval_at = {B, + * C}. Path "A JOIN (B JOIN C)" implies that the PHV is evaluated by the + * "(B JOIN C)", while path "(A JOIN B) JOIN C" evaluates the PHV itself. + */ + foreach(lc, rel->reltarget->exprs) + { + Expr *expr = lfirst(lc); + + if (IsA(expr, PlaceHolderVar)) + return false; + } + + if (IS_SIMPLE_REL(rel)) + { + RangeTblEntry *rte = root->simple_rte_array[rel->relid]; + + /* + * rtekind != RTE_RELATION case is not supported yet. + */ + if (rte->rtekind != RTE_RELATION) + return false; + } + + /* Caller should only pass base relations or joins. */ + Assert(rel->reloptkind == RELOPT_BASEREL || + rel->reloptkind == RELOPT_JOINREL); + + /* + * Check if all aggregate expressions can be evaluated on this relation + * level. + */ + foreach(lc, root->agg_clause_list) + { + AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); + + Assert(IsA(ac_info->aggref, Aggref)); + + /* + * Give up if any aggregate needs relations other than the current one. + * + * If the aggregate needs the current rel plus anything else, then the + * problem is that grouping of the current relation could make some + * input variables unavailable for the "higher aggregate", and it'd + * also decrease the number of input rows the "higher aggregate" + * receives. + * + * If the aggregate does not even need the current rel, then the + * current rel should be grouped because we do not support join of two + * grouped relations. + */ + if (!bms_is_subset(ac_info->agg_eval_at, rel->relids)) + return false; + } + + /* + * Check if all grouping expressions that are appliable to this relation + * can be evaluated on this relation level. + */ + foreach(lc, root->group_expr_list) + { + GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc); + Var *ge_var = castNode(Var, ge_info->expr); + + /* + * Not interested if the grouping expression is not appliable to this + * relation. + */ + if (!bms_is_member(ge_var->varno, rel->relids)) + continue; + + /* + * Give up if any grouping expression can be nulled by an outer join + * above this relation. + */ + if (!bms_is_subset(ge_var->varnullingrels, rel->relids)) + return false; + } + + return true; +} + +/* + * init_grouping_targets + * Initialize target for grouped paths (target) as well as a target for + * paths that generate input for the grouped paths (agg_input). + * + * group_exprs_extra_p receives a list of Var nodes for which we need to + * construct SortGroupClause. Those Vars will then be used as additional + * grouping expressions, for the sake of join clauses. + * + * Return true iff the targets could be initialized. + */ +static bool +init_grouping_targets(PlannerInfo *root, RelOptInfo *rel, + PathTarget *target, PathTarget *agg_input, + List **group_exprs_extra_p) +{ + ListCell *lc; + List *possibly_dependent = NIL; + + foreach(lc, rel->reltarget->exprs) + { + Expr *expr = (Expr *) lfirst(lc); + Index sortgroupref; + + /* + * Given that PlaceHolderVar currently prevents us from doing eager + * aggregation, the source target cannot contain anything more complex + * than a Var. + */ + Assert(IsA(expr, Var)); + + /* Get the sortgroupref if the expr can act as grouping expression. */ + sortgroupref = get_expression_sortgroupref(root, expr); + if (sortgroupref > 0) + { + /* + * If the target expression can be used as the grouping key, it + * should be emitted by the grouped paths that have been pushed + * down to this relation level. + */ + add_column_to_pathtarget(target, expr, sortgroupref); + + /* + * ... and it also should be emitted by the input paths + */ + add_column_to_pathtarget(agg_input, expr, sortgroupref); + } + else + { + bool safe_to_push; + + if (is_var_needed_by_join(root, (Var *) expr, rel, &safe_to_push)) + { + /* + * Give up if this expression is not safe to be used as a + * grouping key at this relation level. + */ + if (!safe_to_push) + return false; + + /* + * The expression is needed for a join, however it's neither in + * the GROUP BY clause nor can it be derived from it using EC. + * (Otherwise it would have already been added to the targets + * above.) We need to construct a special SortGroupClause for + * this expression. + * + * Note that its tleSortGroupRef needs to be unique within + * agg_input, so we need to postpone creation of this + * SortGroupClause until we're done with the iteration of + * rel->reltarget->exprs. And it makes sense for the caller to + * do some more checks before it starts to create those + * SortGroupClauses. + */ + *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr); + } + else if (is_var_in_aggref_only(root, (Var *) expr)) + { + /* + * Another reason we might need this variable is that some + * aggregate pushed down to this relation references it. In + * such a case, add it to "agg_input", but not to "target". + * However, if the aggregate is not the only reason for the var + * to be in the target, some more checks need to be performed + * below. + */ + add_new_column_to_pathtarget(agg_input, expr); + } + else + { + /* + * The Var can be functionally dependent on another expression + * of the target, but we cannot check that until we've built + * all the expressions for the target. + */ + possibly_dependent = lappend(possibly_dependent, expr); + } + } + } + + /* + * Now we can check whether the expression is functionally dependent on + * another one. + */ + foreach(lc, possibly_dependent) + { + Var *tvar; + List *deps = NIL; + RangeTblEntry *rte; + + tvar = lfirst_node(Var, lc); + rte = root->simple_rte_array[tvar->varno]; + + /* + * Check if the Var can be in the grouping key even though it's not + * mentioned by the GROUP BY clause (and could not be derived using + * ECs). + */ + if (check_functional_grouping(rte->relid, tvar->varno, + tvar->varlevelsup, + target->exprs, &deps)) + { + /* + * The var shouldn't be actually used for grouping key evaluation + * (instead, the one this depends on will be), so sortgroupref + * should not be important. + */ + add_new_column_to_pathtarget(target, (Expr *) tvar); + add_new_column_to_pathtarget(agg_input, (Expr *) tvar); + } + else + { + /* + * As long as the query is semantically correct, arriving here + * means that the var is referenced by a generic grouping + * expression but not referenced by any join. + * + * If the eager aggregation will support generic grouping + * expression in the future, create_rel_agg_info() will have to add + * this variable to "agg_input" target and also add the whole + * generic expression to "target". + */ + return false; + } + } + + return true; +} + +/* + * is_var_in_aggref_only + * Check whether the given Var appears in aggregate expressions and not + * elsewhere in the targetlist and havingQual. + */ +static bool +is_var_in_aggref_only(PlannerInfo *root, Var *var) +{ + ListCell *lc; + + /* + * Search the list of aggregate expressions for the Var. + */ + foreach(lc, root->agg_clause_list) + { + AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); + List *vars; + + Assert(IsA(ac_info->aggref, Aggref)); + + if (!bms_is_member(var->varno, ac_info->agg_eval_at)) + continue; + + vars = pull_var_clause((Node *) ac_info->aggref, + PVC_RECURSE_AGGREGATES | + PVC_RECURSE_WINDOWFUNCS | + PVC_RECURSE_PLACEHOLDERS); + + if (list_member(vars, var)) + { + list_free(vars); + break; + } + + list_free(vars); + } + + return (lc != NULL && !list_member(root->tlist_vars, var)); +} + +/* + * is_var_needed_by_join + * Check if the given Var is needed by joins above the current rel. We also + * return in '*safe_to_push' whether it's safe to use this Var as a grouping + * key at this rel level. + * + * Consider pushing the aggregate avg(b.y) down to relation b for the following + * query: + * + * SELECT a.i, avg(b.y) + * FROM a JOIN b ON a.j = b.j + * GROUP BY a.i; + * + * Column b.j needs to be used as the grouping key because otherwise it cannot + * find its way to the input of the join expression. + */ +static bool +is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel, + bool *safe_to_push) +{ + Relids relids; + int attno; + RelOptInfo *baserel; + + /* + * Note that when we are checking if the Var is needed by joins above, we + * want to exclude the situation where the Var is only needed in final + * output. So include "relation 0" here. + */ + relids = bms_copy(rel->relids); + relids = bms_add_member(relids, 0); + + baserel = find_base_rel(root, var->varno); + attno = var->varattno - baserel->min_attr; + + /* + * If the baserel this Var belongs to can be nulled by outer joins that are + * above the current rel, then it is not safe to use this Var as a grouping + * key at current rel level. + */ + *safe_to_push = bms_is_subset(baserel->nulling_relids, rel->relids); + + return bms_nonempty_difference(baserel->attr_needed[attno], relids); +} + +/* + * get_expression_sortgroupref + * Return sortgroupref if the given 'expr' can be used as a grouping + * expression in grouped paths for base or join relations, or 0 otherwise. + * + * Note that we also need to check if the 'expr' is known equal to other exprs + * due to equivalence relationships that can act as grouping expressions. + */ +static Index +get_expression_sortgroupref(PlannerInfo *root, Expr *expr) +{ + ListCell *lc; + + foreach(lc, root->group_expr_list) + { + GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc); + + Assert(IsA(ge_info->expr, Var)); + + if (equal(ge_info->expr, expr) || + exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr, + ge_info->btree_opfamily)) + { + Assert(ge_info->sortgroupref > 0); + + return ge_info->sortgroupref; + } + } + + /* The expression cannot be used as grouping key. */ + return 0; +} diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 5f5d7959d8..877a62a62e 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -3313,10 +3313,11 @@ add_unique_group_var(PlannerInfo *root, List *varinfos, /* * Drop known-equal vars, but only if they belong to different - * relations (see comments for estimate_num_groups) + * relations (see comments for estimate_num_groups). We aren't too + * fussy about the semantics of "equal" here. */ if (vardata->rel != varinfo->rel && - exprs_known_equal(root, var, varinfo->var)) + exprs_known_equal(root, var, varinfo->var, InvalidOid)) { if (varinfo->ndistinct <= ndistinct) { diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 683ab51e6b..fd10498028 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -389,9 +389,12 @@ struct PlannerInfo /* list of AggClauseInfos */ List *agg_clause_list; - /* List of GroupExprInfos */ + /* list of GroupExprInfos */ List *group_expr_list; + /* list of plain Vars contained in targetlist and havingQual */ + List *tlist_vars; + /* array of PlaceHolderInfos indexed by phid */ struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size)); /* allocated size of array */ @@ -434,6 +437,12 @@ struct PlannerInfo */ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); + /* + * list of grouped relation RelAggInfos. One instance of RelAggInfo per + * item of the upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list. + */ + RelInfoList *agg_info_list; + /* Result tlists chosen by grouping_planner for upper-stage processing */ struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index 112e7c23d4..02da68a753 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -314,6 +314,10 @@ extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid); extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid); extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid); extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids); +extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, + RelAggInfo *agg_info); +extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids, + RelAggInfo **agg_info_p); extern RelOptInfo *build_join_rel(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer_rel, @@ -348,4 +352,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root, RelOptInfo *parent_joinrel, List *restrictlist, SpecialJoinInfo *sjinfo); +extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel); #endif /* PATHNODE_H */ diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index d8199333c9..ae7a8ed742 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -159,7 +159,8 @@ extern List *generate_join_implied_equalities_for_ecs(PlannerInfo *root, Relids join_relids, Relids outer_relids, RelOptInfo *inner_rel); -extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2); +extern bool exprs_known_equal(PlannerInfo *root, Node *item1, Node *item2, + Oid opfamily); extern EquivalenceClass *match_eclasses_to_foreign_key_col(PlannerInfo *root, ForeignKeyOptInfo *fkinfo, int colno); -- 2.43.0 [application/octet-stream] v9-0005-Implement-functions-that-generate-paths-for-grouped-relations.patch (13.1K, 6-v9-0005-Implement-functions-that-generate-paths-for-grouped-relations.patch) download | inline diff: From 0df60b1a5567eee57b7437677d10d77c7060ba44 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 16:05:50 +0900 Subject: [PATCH v9 05/10] Implement functions that generate paths for grouped relations This commit implements the functions that generate paths for grouped relations by adding sorted and hashed partial aggregation paths on top of paths of the plain base or join relations. --- src/backend/optimizer/path/allpaths.c | 307 ++++++++++++++++++++++++++ src/backend/optimizer/util/pathnode.c | 12 +- src/include/optimizer/paths.h | 4 + 3 files changed, 315 insertions(+), 8 deletions(-) diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index d1b974367b..0c2fae9608 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -40,6 +40,7 @@ #include "optimizer/paths.h" #include "optimizer/plancat.h" #include "optimizer/planner.h" +#include "optimizer/prep.h" #include "optimizer/tlist.h" #include "parser/parse_clause.h" #include "parser/parsetree.h" @@ -47,6 +48,7 @@ #include "port/pg_bitutils.h" #include "rewrite/rewriteManip.h" #include "utils/lsyscache.h" +#include "utils/selfuncs.h" /* Bitmask flags for pushdown_safety_info.unsafeFlags */ @@ -3296,6 +3298,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r } } +/* + * generate_grouped_paths + * Generate paths for a grouped relation by adding sorted and hashed + * partial aggregation paths on top of paths of the plain base or join + * relation. + * + * The information needed are provided by the RelAggInfo structure. + */ +void +generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, + RelOptInfo *rel_plain, RelAggInfo *agg_info) +{ + AggClauseCosts agg_costs; + bool can_hash; + bool can_sort; + Path *cheapest_total_path = NULL; + Path *cheapest_partial_path = NULL; + double dNumGroups = 0; + double dNumPartialGroups = 0; + + if (IS_DUMMY_REL(rel_plain)) + { + mark_dummy_rel(rel_grouped); + return; + } + + MemSet(&agg_costs, 0, sizeof(AggClauseCosts)); + get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs); + + /* + * Determine whether it's possible to perform sort-based implementations of + * grouping. + */ + can_sort = grouping_is_sortable(agg_info->group_clauses); + + /* + * Determine whether we should consider hash-based implementations of + * grouping. + */ + Assert(root->numOrderedAggs == 0); + can_hash = (agg_info->group_clauses != NIL && + grouping_is_hashable(agg_info->group_clauses)); + + /* + * Consider whether we should generate partially aggregated non-partial + * paths. We can only do this if we have a non-partial path. + */ + if (rel_plain->pathlist != NIL) + { + cheapest_total_path = rel_plain->cheapest_total_path; + Assert(cheapest_total_path != NULL); + } + + /* + * If parallelism is possible for rel_grouped, then we should consider + * generating partially-grouped partial paths. However, if the plain rel + * has no partial paths, then we can't. + */ + if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL) + { + cheapest_partial_path = linitial(rel_plain->partial_pathlist); + Assert(cheapest_partial_path != NULL); + } + + /* Estimate number of partial groups. */ + if (cheapest_total_path != NULL) + dNumGroups = estimate_num_groups(root, + agg_info->group_exprs, + cheapest_total_path->rows, + NULL, NULL); + if (cheapest_partial_path != NULL) + dNumPartialGroups = estimate_num_groups(root, + agg_info->group_exprs, + cheapest_partial_path->rows, + NULL, NULL); + + if (can_sort && cheapest_total_path != NULL) + { + ListCell *lc; + + /* + * Use any available suitably-sorted path as input, and also consider + * sorting the cheapest-total path. + */ + foreach(lc, rel_plain->pathlist) + { + Path *input_path = (Path *) lfirst(lc); + Path *path; + bool is_sorted; + int presorted_keys; + + /* + * Since the path originates from the non-grouped relation which is + * not aware of eager aggregation, we must ensure that it provides + * the correct input for the partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + input_path, + agg_info->agg_input); + + is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys, + path->pathkeys, + &presorted_keys); + if (!is_sorted) + { + /* + * Try at least sorting the cheapest path and also try + * incrementally sorting any path which is partially sorted + * already (no need to deal with paths which have presorted + * keys when incremental sort is disabled unless it's the + * cheapest input path). + */ + if (input_path != cheapest_total_path && + (presorted_keys == 0 || !enable_incremental_sort)) + continue; + + /* + * We've no need to consider both a sort and incremental sort. + * We'll just do a sort if there are no presorted keys and an + * incremental sort when there are presorted keys. + */ + if (presorted_keys == 0 || !enable_incremental_sort) + path = (Path *) create_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + -1.0); + else + path = (Path *) create_incremental_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + presorted_keys, + -1.0); + } + + /* + * qual is NIL because the HAVING clause cannot be evaluated until the + * final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_SORTED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumGroups); + + add_path(rel_grouped, path); + } + } + + if (can_sort && cheapest_partial_path != NULL) + { + ListCell *lc; + + /* Similar to above logic, but for partial paths. */ + foreach(lc, rel_plain->partial_pathlist) + { + Path *input_path = (Path *) lfirst(lc); + Path *path; + bool is_sorted; + int presorted_keys; + + /* + * Since the path originates from the non-grouped relation which is + * not aware of eager aggregation, we must ensure that it provides + * the correct input for the partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + input_path, + agg_info->agg_input); + + is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys, + path->pathkeys, + &presorted_keys); + + if (!is_sorted) + { + /* + * Try at least sorting the cheapest path and also try + * incrementally sorting any path which is partially sorted + * already (no need to deal with paths which have presorted + * keys when incremental sort is disabled unless it's the + * cheapest input path). + */ + if (input_path != cheapest_partial_path && + (presorted_keys == 0 || !enable_incremental_sort)) + continue; + + /* + * We've no need to consider both a sort and incremental sort. + * We'll just do a sort if there are no presorted keys and an + * incremental sort when there are presorted keys. + */ + if (presorted_keys == 0 || !enable_incremental_sort) + path = (Path *) create_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + -1.0); + else + path = (Path *) create_incremental_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + presorted_keys, + -1.0); + } + + /* + * qual is NIL because the HAVING clause cannot be evaluated until the + * final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_SORTED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumPartialGroups); + + add_partial_path(rel_grouped, path); + } + } + + /* + * Add a partially-grouped HashAgg Path where possible + */ + if (can_hash && cheapest_total_path != NULL) + { + Path *path; + + /* + * Since the path originates from the non-grouped relation which is + * not aware of eager aggregation, we must ensure that it provides + * the correct input for the partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + cheapest_total_path, + agg_info->agg_input); + + /* + * qual is NIL because the HAVING clause cannot be evaluated until + * the final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_HASHED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumGroups); + + add_path(rel_grouped, path); + } + + /* + * Now add a partially-grouped HashAgg partial Path where possible + */ + if (can_hash && cheapest_partial_path != NULL) + { + Path *path; + + /* + * Since the path originates from the non-grouped relation which is + * not aware of eager aggregation, we must ensure that it provides + * the correct input for the partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + cheapest_partial_path, + agg_info->agg_input); + + /* + * qual is NIL because the HAVING clause cannot be evaluated until + * the final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_HASHED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumPartialGroups); + + add_partial_path(rel_grouped, path); + } +} + /* * make_rel_from_joinlist * Build access paths using a "joinlist" to guide the join path search. diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index c42742d2c7..8de9825f80 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -2709,8 +2709,7 @@ create_projection_path(PlannerInfo *root, pathnode->path.pathtype = T_Result; pathnode->path.parent = rel; pathnode->path.pathtarget = target; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe && @@ -2962,8 +2961,7 @@ create_incremental_sort_path(PlannerInfo *root, pathnode->path.parent = rel; /* Sort doesn't project, so use source path's pathtarget */ pathnode->path.pathtarget = subpath->pathtarget; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe; @@ -3009,8 +3007,7 @@ create_sort_path(PlannerInfo *root, pathnode->path.parent = rel; /* Sort doesn't project, so use source path's pathtarget */ pathnode->path.pathtarget = subpath->pathtarget; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe; @@ -3168,8 +3165,7 @@ create_agg_path(PlannerInfo *root, pathnode->path.pathtype = T_Agg; pathnode->path.parent = rel; pathnode->path.pathtarget = target; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe; diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index ae7a8ed742..413c269091 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -58,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows); extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows); +extern void generate_grouped_paths(PlannerInfo *root, + RelOptInfo *rel_grouped, + RelOptInfo *rel_plain, + RelAggInfo *agg_info); extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages, int max_workers); extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel, -- 2.43.0 [application/octet-stream] v9-0006-Build-grouped-relations-out-of-base-relations.patch (9.0K, 7-v9-0006-Build-grouped-relations-out-of-base-relations.patch) download | inline diff: From 397840b64b5d1847d0dbb62990c002badbb9c202 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 16:07:32 +0900 Subject: [PATCH v9 06/10] Build grouped relations out of base relations This commit builds grouped relations for each base relation if possible, and generates aggregation paths for the grouped base relations. --- src/backend/optimizer/path/allpaths.c | 91 +++++++++++++++++++++++ src/backend/optimizer/util/relnode.c | 101 ++++++++++++++++++++++++++ src/include/optimizer/pathnode.h | 4 + 3 files changed, 196 insertions(+) diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 0c2fae9608..9219815e3d 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -93,6 +93,7 @@ join_search_hook_type join_search_hook = NULL; static void set_base_rel_consider_startup(PlannerInfo *root); static void set_base_rel_sizes(PlannerInfo *root); +static void setup_base_grouped_rels(PlannerInfo *root); static void set_base_rel_pathlists(PlannerInfo *root); static void set_rel_size(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); @@ -117,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); +static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel); static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel, List *live_childrels, List *all_child_pathkeys); @@ -185,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist) */ set_base_rel_sizes(root); + /* + * Build grouped base relations for each base rel if possible. + */ + setup_base_grouped_rels(root); + /* * We should now have size estimates for every actual table involved in * the query, and we also know which if any have been deleted from the @@ -326,6 +333,59 @@ set_base_rel_sizes(PlannerInfo *root) } } +/* + * setup_base_grouped_rels + * For each "plain" base relation build a grouped base relation if eager + * aggregation is possible and if this relation can produce grouped paths. + */ +static void +setup_base_grouped_rels(PlannerInfo *root) +{ + Index rti; + + /* + * If there are no aggregate expressions or grouping expressions, eager + * aggregation is not possible. + */ + if (root->agg_clause_list == NIL || + root->group_expr_list == NIL) + return; + + /* + * Eager aggregation only makes sense if there are multiple base rels in + * the query. + */ + if (bms_membership(root->all_baserels) != BMS_MULTIPLE) + return; + + for (rti = 1; rti < root->simple_rel_array_size; rti++) + { + RelOptInfo *rel = root->simple_rel_array[rti]; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + /* there may be empty slots corresponding to non-baserel RTEs */ + if (rel == NULL) + continue; + + Assert(rel->relid == rti); /* sanity check on array */ + + /* + * Ignore RTEs that are not simple rels. Note that we need to consider + * "other rels" here. + */ + if (!IS_SIMPLE_REL(rel)) + continue; + + rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info); + if (rel_grouped) + { + /* Make the grouped relation available for joining. */ + add_grouped_rel(root, rel_grouped, agg_info); + } + } +} + /* * set_base_rel_pathlists * Finds all paths available for scanning each base-relation entry. @@ -562,6 +622,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, /* Now find the cheapest of the paths for this rel */ set_cheapest(rel); + /* + * If a grouped relation for this rel exists, build partial aggregation + * paths for it. + * + * Note that this can only happen after we've called set_cheapest() for + * this base rel, because we need its cheapest paths. + */ + set_grouped_rel_pathlist(root, rel); + #ifdef OPTIMIZER_DEBUG pprint(rel); #endif @@ -1289,6 +1358,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, add_paths_to_append_rel(root, rel, live_childrels); } +/* + * set_grouped_rel_pathlist + * If a grouped relation for the given 'rel' exists, build partial + * aggregation paths for it. + */ +static void +set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel) +{ + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + /* Add paths to the grouped base relation if one exists. */ + rel_grouped = find_grouped_rel(root, rel->relids, + &agg_info); + if (rel_grouped) + { + generate_grouped_paths(root, rel_grouped, rel, + agg_info); + set_cheapest(rel_grouped); + } +} + /* * add_paths_to_append_rel diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 27f779d778..f8f0c0fc69 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -16,6 +16,7 @@ #include <limits.h> +#include "catalog/pg_constraint.h" #include "miscadmin.h" #include "nodes/nodeFuncs.h" #include "optimizer/appendinfo.h" @@ -27,12 +28,15 @@ #include "optimizer/paths.h" #include "optimizer/placeholder.h" #include "optimizer/plancat.h" +#include "optimizer/planner.h" #include "optimizer/restrictinfo.h" #include "optimizer/tlist.h" +#include "parser/parse_oper.h" #include "parser/parse_relation.h" #include "rewrite/rewriteManip.h" #include "utils/hsearch.h" #include "utils/lsyscache.h" +#include "utils/selfuncs.h" /* @@ -419,6 +423,103 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) return rel; } +/* + * build_simple_grouped_rel + * Construct a new RelOptInfo for a grouped base relation out of an existing + * non-grouped base relation. + * + * On success, the new RelOptInfo is returned and the corresponding RelAggInfo + * is stored in *agg_info_p. + */ +RelOptInfo * +build_simple_grouped_rel(PlannerInfo *root, int relid, + RelAggInfo **agg_info_p) +{ + RelOptInfo *rel_plain; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + /* + * We should have available aggregate expressions and grouping expressions, + * otherwise we cannot reach here. + */ + Assert(root->agg_clause_list != NIL); + Assert(root->group_expr_list != NIL); + + rel_plain = root->simple_rel_array[relid]; + Assert(rel_plain != NULL); + Assert(IS_SIMPLE_REL(rel_plain)); + + /* nothing to do for dummy rel */ + if (IS_DUMMY_REL(rel_plain)) + return NULL; + + /* + * Prepare the information we need to create grouped paths for this base + * relation. + */ + agg_info = create_rel_agg_info(root, rel_plain); + if (agg_info == NULL) + return NULL; + + /* build a grouped relation out of the plain relation */ + rel_grouped = build_grouped_rel(root, rel_plain); + rel_grouped->reltarget = agg_info->target; + rel_grouped->rows = agg_info->grouped_rows; + + /* return the RelAggInfo structure */ + *agg_info_p = agg_info; + + return rel_grouped; +} + +/* + * build_grouped_rel + * Build a grouped relation by flat copying a plain relation and resetting + * the necessary fields. + */ +RelOptInfo * +build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain) +{ + RelOptInfo *rel_grouped; + + rel_grouped = makeNode(RelOptInfo); + memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo)); + + /* + * clear path info + */ + rel_grouped->pathlist = NIL; + rel_grouped->ppilist = NIL; + rel_grouped->partial_pathlist = NIL; + rel_grouped->cheapest_startup_path = NULL; + rel_grouped->cheapest_total_path = NULL; + rel_grouped->cheapest_unique_path = NULL; + rel_grouped->cheapest_parameterized_paths = NIL; + + /* + * clear partition info + */ + rel_grouped->part_scheme = NULL; + rel_grouped->nparts = -1; + rel_grouped->boundinfo = NULL; + rel_grouped->partbounds_merged = false; + rel_grouped->partition_qual = NIL; + rel_grouped->part_rels = NULL; + rel_grouped->live_parts = NULL; + rel_grouped->all_partrels = NULL; + rel_grouped->partexprs = NULL; + rel_grouped->nullable_partexprs = NULL; + rel_grouped->consider_partitionwise_join = false; + + /* + * clear size estimates + */ + rel_grouped->rows = 0; + + return rel_grouped; +} + /* * find_base_rel * Find a base or otherrel relation entry, which must already exist. diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index 02da68a753..525481f296 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -310,6 +310,10 @@ extern void setup_simple_rel_arrays(PlannerInfo *root); extern void expand_planner_arrays(PlannerInfo *root, int add_size); extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent); +extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid, + RelAggInfo **agg_info_p); +extern RelOptInfo *build_grouped_rel(PlannerInfo *root, + RelOptInfo *rel_plain); extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid); extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid); extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid); -- 2.43.0 [application/octet-stream] v9-0007-Build-grouped-relations-out-of-join-relations.patch (27.0K, 8-v9-0007-Build-grouped-relations-out-of-join-relations.patch) download | inline diff: From 3635321462aee998a96d6a4c7a52fa71b0548335 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 16:08:23 +0900 Subject: [PATCH v9 07/10] Build grouped relations out of join relations This commit builds grouped relations for each just-processed join relation if possible, and generates aggregation paths for the grouped join relations. The changes made to make_join_rel() are relatively minor, with the addition of a new function make_grouped_join_rel(), which finds or creates a grouped relation for the just-processed joinrel, and generates grouped paths by joining a grouped input relation with a non-grouped input relation. The other way to generate grouped paths is by adding sorted and hashed partial aggregation paths on top of paths of the joinrel. This occurs in standard_join_search(), after we've run set_cheapest() for the joinrel. The reason for performing this step after set_cheapest() is that we need to know the joinrel's cheapest paths (see generate_grouped_paths()). This patch also makes the grouped relation for the topmost join rel act as the upper rel representing the result of partial aggregation, so that we can add the final aggregation on top of that. Additionally, this patch extends the functionality of eager aggregation to work with partitionwise join and geqo. This patch also makes eager aggregation work with outer joins. With outer joins, the aggregate cannot be pushed down if any column referenced by grouping expressions or aggregate functions is nullable by an outer join above the relation to which we want to apply the partial aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we can easily identify such situations and subsequently refrain from pushing down the aggregates. Starting from this patch, you should be able to see plans with eager aggregation. --- src/backend/optimizer/geqo/geqo_eval.c | 84 +++++++++++---- src/backend/optimizer/path/allpaths.c | 48 +++++++++ src/backend/optimizer/path/joinrels.c | 136 ++++++++++++++++++++++++ src/backend/optimizer/plan/planner.c | 100 ++++++++++++----- src/backend/optimizer/util/appendinfo.c | 60 +++++++++++ src/backend/optimizer/util/relnode.c | 2 - src/include/nodes/pathnodes.h | 6 -- 7 files changed, 385 insertions(+), 51 deletions(-) diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c index 1141156899..278857d767 100644 --- a/src/backend/optimizer/geqo/geqo_eval.c +++ b/src/backend/optimizer/geqo/geqo_eval.c @@ -60,8 +60,12 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) MemoryContext oldcxt; RelOptInfo *joinrel; Cost fitness; - int savelength; - struct HTAB *savehash; + int savelength_join_rel; + struct HTAB *savehash_join_rel; + int savelength_grouped_rel; + struct HTAB *savehash_grouped_rel; + int savelength_grouped_info; + struct HTAB *savehash_grouped_info; /* * Create a private memory context that will hold all temp storage @@ -78,25 +82,38 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) oldcxt = MemoryContextSwitchTo(mycontext); /* - * gimme_tree will add entries to root->join_rel_list, which may or may - * not already contain some entries. The newly added entries will be - * recycled by the MemoryContextDelete below, so we must ensure that the - * list is restored to its former state before exiting. We can do this by - * truncating the list to its original length. NOTE this assumes that any - * added entries are appended at the end! + * gimme_tree will add entries to root->join_rel_list, root->agg_info_list + * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not + * already contain some entries. The newly added entries will be recycled + * by the MemoryContextDelete below, so we must ensure that each list of + * the RelInfoList structures is restored to its former state before + * exiting. We can do this by truncating each list to its original length. + * NOTE this assumes that any added entries are appended at the end! * - * We also must take care not to mess up the outer join_rel_list->hash, if - * there is one. We can do this by just temporarily setting the link to - * NULL. (If we are dealing with enough join rels, which we very likely - * are, a new hash table will get built and used locally.) + * We also must take care not to mess up the outer hash tables of the + * RelInfoList structures, if any. We can do this by just temporarily + * setting each link to NULL. (If we are dealing with enough join rels, + * which we very likely are, new hash tables will get built and used + * locally.) * * join_rel_level[] shouldn't be in use, so just Assert it isn't. */ - savelength = list_length(root->join_rel_list->items); - savehash = root->join_rel_list->hash; + savelength_join_rel = list_length(root->join_rel_list->items); + savehash_join_rel = root->join_rel_list->hash; + + savelength_grouped_rel = + list_length(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items); + savehash_grouped_rel = + root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash; + + savelength_grouped_info = list_length(root->agg_info_list->items); + savehash_grouped_info = root->agg_info_list->hash; + Assert(root->join_rel_level == NULL); root->join_rel_list->hash = NULL; + root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL; + root->agg_info_list->hash = NULL; /* construct the best path for the given combination of relations */ joinrel = gimme_tree(root, tour, num_gene); @@ -118,12 +135,22 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) fitness = DBL_MAX; /* - * Restore join_rel_list to its former state, and put back original - * hashtable if any. + * Restore each of the list in join_rel_list, agg_info_list and + * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back + * original hashtable if any. */ root->join_rel_list->items = list_truncate(root->join_rel_list->items, - savelength); - root->join_rel_list->hash = savehash; + savelength_join_rel); + root->join_rel_list->hash = savehash_join_rel; + + root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items = + list_truncate(root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].items, + savelength_grouped_rel); + root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = savehash_grouped_rel; + + root->agg_info_list->items = list_truncate(root->agg_info_list->items, + savelength_grouped_info); + root->agg_info_list->hash = savehash_grouped_info; /* release all the memory acquired within gimme_tree */ MemoryContextSwitchTo(oldcxt); @@ -279,6 +306,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene, /* Find and save the cheapest paths for this joinrel */ set_cheapest(joinrel); + /* + * Except for the topmost scan/join rel, consider generating + * partial aggregation paths for the grouped relation on top of the + * paths of this rel. After that, we're done creating paths for + * the grouped relation, so run set_cheapest(). + */ + if (!bms_equal(joinrel->relids, root->all_query_rels)) + { + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + rel_grouped = find_grouped_rel(root, joinrel->relids, + &agg_info); + if (rel_grouped) + { + generate_grouped_paths(root, rel_grouped, joinrel, + agg_info); + set_cheapest(rel_grouped); + } + } + /* Absorb new clump into old */ old_clump->joinrel = joinrel; old_clump->size += new_clump->size; diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 9219815e3d..359eee3486 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -3854,6 +3854,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) * * After that, we're done creating paths for the joinrel, so run * set_cheapest(). + * + * In addition, we also run generate_grouped_paths() for the grouped + * relation of each just-processed joinrel, and run set_cheapest() for + * the grouped relation afterwards. */ foreach(lc, root->join_rel_level[lev]) { @@ -3874,6 +3878,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) /* Find and save the cheapest paths for this rel */ set_cheapest(rel); + /* + * Except for the topmost scan/join rel, consider generating + * partial aggregation paths for the grouped relation on top of the + * paths of this rel. After that, we're done creating paths for + * the grouped relation, so run set_cheapest(). + */ + if (!bms_equal(rel->relids, root->all_query_rels)) + { + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + rel_grouped = find_grouped_rel(root, rel->relids, + &agg_info); + if (rel_grouped) + { + generate_grouped_paths(root, rel_grouped, rel, + agg_info); + set_cheapest(rel_grouped); + } + } + #ifdef OPTIMIZER_DEBUG pprint(rel); #endif @@ -4742,6 +4767,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel) if (IS_DUMMY_REL(child_rel)) continue; + /* + * Except for the topmost scan/join rel, consider generating partial + * aggregation paths for the grouped relation on top of the paths of + * this partitioned child-join. After that, we're done creating paths + * for the grouped relation, so run set_cheapest(). + */ + if (!bms_equal(IS_OTHER_REL(rel) ? + rel->top_parent_relids : rel->relids, + root->all_query_rels)) + { + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + rel_grouped = find_grouped_rel(root, child_rel->relids, + &agg_info); + if (rel_grouped) + { + generate_grouped_paths(root, rel_grouped, child_rel, + agg_info); + set_cheapest(rel_grouped); + } + } + #ifdef OPTIMIZER_DEBUG pprint(child_rel); #endif diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index db475e25b1..78a88c9d3b 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -16,11 +16,13 @@ #include "miscadmin.h" #include "optimizer/appendinfo.h" +#include "optimizer/cost.h" #include "optimizer/joininfo.h" #include "optimizer/pathnode.h" #include "optimizer/paths.h" #include "partitioning/partbounds.h" #include "utils/memutils.h" +#include "utils/selfuncs.h" static void make_rels_by_clause_joins(PlannerInfo *root, @@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel); static bool restriction_is_constant_false(List *restrictlist, RelOptInfo *joinrel, bool only_pushed_down); +static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1, + RelOptInfo *rel2, RelOptInfo *joinrel, + SpecialJoinInfo *sjinfo, List *restrictlist); static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, RelOptInfo *joinrel, SpecialJoinInfo *sjinfo, List *restrictlist); @@ -771,6 +776,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2) return joinrel; } + /* Build a grouped join relation for 'joinrel' if possible. */ + make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo, + restrictlist); + /* Add paths to the join relation. */ populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo, restrictlist); @@ -882,6 +891,128 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids, return input_relids; } +/* + * make_grouped_join_rel + * Build a grouped join relation out of 'joinrel' if eager aggregation is + * possible and the 'joinrel' can produce grouped paths. + * + * We also generate partial aggregation paths for the grouped relation by + * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by + * joining the grouped paths of 'rel2' to the plain paths of 'rel1'. + */ +static void +make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1, + RelOptInfo *rel2, RelOptInfo *joinrel, + SpecialJoinInfo *sjinfo, List *restrictlist) +{ + RelOptInfo *rel_grouped; + RelAggInfo *agg_info = NULL; + RelOptInfo *rel1_grouped; + RelOptInfo *rel2_grouped; + bool rel1_empty; + bool rel2_empty; + + /* + * If there are no aggregate expressions or grouping expressions, eager + * aggregation is not possible. + */ + if (root->agg_clause_list == NIL || + root->group_expr_list == NIL) + return; + + /* + * See if we already have a grouped joinrel for this joinrel. + */ + rel_grouped = find_grouped_rel(root, joinrel->relids, &agg_info); + + /* + * Construct a new RelOptInfo for the grouped join relation if there is no + * existing one. + */ + if (rel_grouped == NULL) + { + /* + * Prepare the information we need to create grouped paths for this + * join relation. + */ + agg_info = create_rel_agg_info(root, joinrel); + if (agg_info == NULL) + return; + + /* build a grouped relation out of the plain relation */ + rel_grouped = build_grouped_rel(root, joinrel); + rel_grouped->reltarget = agg_info->target; + rel_grouped->rows = agg_info->grouped_rows; + + /* + * Make the grouped relation available for further joining or for + * acting as the upper rel representing the result of partial + * aggregation. + */ + add_grouped_rel(root, rel_grouped, agg_info); + } + + Assert(agg_info != NULL); + + /* + * If we've already proven this grouped join relation is empty, we needn't + * consider any more paths for it. + */ + if (IS_DUMMY_REL(rel_grouped)) + return; + + /* retrieve the grouped relations for the two input rels */ + rel1_grouped = find_grouped_rel(root, rel1->relids, NULL); + rel2_grouped = find_grouped_rel(root, rel2->relids, NULL); + + rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped)); + rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped)); + + /* Nothing to do if there's no grouped relation. */ + if (rel1_empty && rel2_empty) + return; + + /* + * Join of two grouped relations is currently not supported. In such a + * case, grouping of one side would change the occurrence of the other + * side's aggregate transient states on the input of the final aggregation. + * This can be handled by adjusting the transient states, but it's not + * worth the effort for now. + */ + if (!rel1_empty && !rel2_empty) + return; + + /* generate partial aggregation paths for the grouped relation */ + if (!rel1_empty) + { + set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2, + sjinfo, restrictlist); + populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped, + sjinfo, restrictlist); + /* + * It shouldn't happen that we have marked rel1_grouped as dummy in + * populate_joinrel_with_paths due to provably constant-false join + * restrictions, hence we wouldn't end up with a plan that has Aggref + * in non-Agg plan node. + */ + Assert(!IS_DUMMY_REL(rel1_grouped)); + } + else if (!rel2_empty) + { + set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped, + sjinfo, restrictlist); + populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped, + sjinfo, restrictlist); + /* + * It shouldn't happen that we have marked rel2_grouped as dummy in + * populate_joinrel_with_paths due to provably constant-false join + * restrictions, hence we wouldn't end up with a plan that has Aggref + * in non-Agg plan node. + */ + Assert(!IS_DUMMY_REL(rel2_grouped)); + } +} + /* * populate_joinrel_with_paths * Add paths to the given joinrel for given pair of joining relations. The @@ -1671,6 +1802,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, adjust_child_relids(joinrel->relids, nappinfos, appinfos))); + /* Build a grouped join relation for 'child_joinrel' if possible */ + make_grouped_join_rel(root, child_rel1, child_rel2, + child_joinrel, child_sjinfo, + child_restrictlist); + /* And make paths for the child join */ populate_joinrel_with_paths(root, child_rel1, child_rel2, child_joinrel, child_sjinfo, diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 4711f91239..b69efb3cd1 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -225,7 +225,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, RelOptInfo *partially_grouped_rel, const AggClauseCosts *agg_costs, grouping_sets_data *gd, - double dNumGroups, GroupPathExtraData *extra); static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root, RelOptInfo *grouped_rel, @@ -3999,9 +3998,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, GroupPathExtraData *extra, RelOptInfo **partially_grouped_rel_p) { - Path *cheapest_path = input_rel->cheapest_total_path; RelOptInfo *partially_grouped_rel = NULL; - double dNumGroups; PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE; /* @@ -4082,23 +4079,21 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, /* Gather any partially grouped partial paths. */ if (partially_grouped_rel && partially_grouped_rel->partial_pathlist) - { gather_grouping_paths(root, partially_grouped_rel); - set_cheapest(partially_grouped_rel); - } /* - * Estimate number of groups. + * Now choose the best path(s) for partially_grouped_rel. + * + * Note that the non-partial paths can come either from the Gather above or + * from eager aggregation. */ - dNumGroups = get_number_of_groups(root, - cheapest_path->rows, - gd, - extra->targetList); + if (partially_grouped_rel && partially_grouped_rel->pathlist) + set_cheapest(partially_grouped_rel); /* Build final grouping paths */ add_paths_to_grouping_rel(root, input_rel, grouped_rel, partially_grouped_rel, agg_costs, gd, - dNumGroups, extra); + extra); /* Give a helpful error if we failed to find any implementation */ if (grouped_rel->pathlist == NIL) @@ -6967,16 +6962,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, RelOptInfo *grouped_rel, RelOptInfo *partially_grouped_rel, const AggClauseCosts *agg_costs, - grouping_sets_data *gd, double dNumGroups, + grouping_sets_data *gd, GroupPathExtraData *extra) { Query *parse = root->parse; Path *cheapest_path = input_rel->cheapest_total_path; + Path *cheapest_partially_grouped_path = NULL; ListCell *lc; bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0; bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0; List *havingQual = (List *) extra->havingQual; AggClauseCosts *agg_final_costs = &extra->agg_final_costs; + double dNumGroups = 0; + double dNumFinalGroups = 0; + + /* + * Estimate number of groups for non-split aggregation. + */ + dNumGroups = get_number_of_groups(root, + cheapest_path->rows, + gd, + extra->targetList); + + if (partially_grouped_rel && partially_grouped_rel->pathlist) + { + cheapest_partially_grouped_path = + partially_grouped_rel->cheapest_total_path; + + /* + * Estimate number of groups for final phase of partial aggregation. + */ + dNumFinalGroups = + get_number_of_groups(root, + cheapest_partially_grouped_path->rows, + gd, + extra->targetList); + } if (can_sort) { @@ -7088,7 +7109,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, path = make_ordered_path(root, grouped_rel, path, - partially_grouped_rel->cheapest_total_path, + cheapest_partially_grouped_path, info->pathkeys); if (path == NULL) @@ -7105,7 +7126,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, info->clauses, havingQual, agg_final_costs, - dNumGroups)); + dNumFinalGroups)); else add_path(grouped_rel, (Path *) create_group_path(root, @@ -7113,7 +7134,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, path, info->clauses, havingQual, - dNumGroups)); + dNumFinalGroups)); } } @@ -7155,19 +7176,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, */ if (partially_grouped_rel && partially_grouped_rel->pathlist) { - Path *path = partially_grouped_rel->cheapest_total_path; - add_path(grouped_rel, (Path *) create_agg_path(root, grouped_rel, - path, + cheapest_partially_grouped_path, grouped_rel->reltarget, AGG_HASHED, AGGSPLIT_FINAL_DESERIAL, root->processed_groupClause, havingQual, agg_final_costs, - dNumGroups)); + dNumFinalGroups)); } } @@ -7217,6 +7236,21 @@ create_partial_grouping_paths(PlannerInfo *root, bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0; bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0; + /* + * The partially_grouped_rel could have been already created due to eager + * aggregation. + */ + partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL); + Assert(enable_eager_aggregate || partially_grouped_rel == NULL); + + /* + * It is possible that the partially_grouped_rel created by eager + * aggregation is dummy. In this case we just set it to NULL. It might be + * created again by the following logic if possible. + */ + if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel)) + partially_grouped_rel = NULL; + /* * Consider whether we should generate partially aggregated non-partial * paths. We can only do this if we have a non-partial path, and only if @@ -7240,19 +7274,27 @@ create_partial_grouping_paths(PlannerInfo *root, * If we can't partially aggregate partial paths, and we can't partially * aggregate non-partial paths, then don't bother creating the new * RelOptInfo at all, unless the caller specified force_rel_creation. + * + * Note that the partially_grouped_rel could have been already created and + * populated with appropriate paths by eager aggregation. */ if (cheapest_total_path == NULL && cheapest_partial_path == NULL && + (partially_grouped_rel == NULL || + partially_grouped_rel->pathlist == NIL) && !force_rel_creation) return NULL; /* * Build a new upper relation to represent the result of partially - * aggregating the rows from the input relation. - */ - partially_grouped_rel = fetch_upper_rel(root, - UPPERREL_PARTIAL_GROUP_AGG, - grouped_rel->relids); + * aggregating the rows from the input relation. The relation may already + * exist due to eager aggregation, in which case we don't need to create + * it. + */ + if (partially_grouped_rel == NULL) + partially_grouped_rel = fetch_upper_rel(root, + UPPERREL_PARTIAL_GROUP_AGG, + grouped_rel->relids); partially_grouped_rel->consider_parallel = grouped_rel->consider_parallel; partially_grouped_rel->reloptkind = grouped_rel->reloptkind; @@ -7261,6 +7303,14 @@ create_partial_grouping_paths(PlannerInfo *root, partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent; partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine; + /* + * Partially-grouped partial paths may have been generated by eager + * aggregation. If we find that parallelism is not possible for + * partially_grouped_rel, we need to drop these partial paths. + */ + if (!partially_grouped_rel->consider_parallel) + partially_grouped_rel->partial_pathlist = NIL; + /* * Build target list for partial aggregate paths. These paths cannot just * emit the same tlist as regular aggregate paths, because (1) we must diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index 6ba4eba224..08de77d439 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -495,6 +495,66 @@ adjust_appendrel_attrs_mutator(Node *node, return (Node *) newinfo; } + /* + * We have to process RelAggInfo nodes specially. + */ + if (IsA(node, RelAggInfo)) + { + RelAggInfo *oldinfo = (RelAggInfo *) node; + RelAggInfo *newinfo = makeNode(RelAggInfo); + + /* Copy all flat-copiable fields */ + memcpy(newinfo, oldinfo, sizeof(RelAggInfo)); + + newinfo->relids = adjust_child_relids(oldinfo->relids, + context->nappinfos, + context->appinfos); + + newinfo->target = (PathTarget *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->target, + context); + + newinfo->agg_input = (PathTarget *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input, + context); + + newinfo->group_clauses = (List *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses, + context); + + newinfo->group_exprs = (List *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs, + context); + + return (Node *) newinfo; + } + + /* + * We have to process PathTarget nodes specially. + */ + if (IsA(node, PathTarget)) + { + PathTarget *oldtarget = (PathTarget *) node; + PathTarget *newtarget = makeNode(PathTarget); + + /* Copy all flat-copiable fields */ + memcpy(newtarget, oldtarget, sizeof(PathTarget)); + + if (oldtarget->sortgrouprefs) + { + Size nbytes = list_length(oldtarget->exprs) * sizeof(Index); + + newtarget->exprs = (List *) + adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs, + context); + + newtarget->sortgrouprefs = (Index *) palloc(nbytes); + memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes); + } + + return (Node *) newtarget; + } + /* * NOTE: we do not need to recurse into sublinks, because they should * already have been converted to subplans before we see them. diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index f8f0c0fc69..91013e1a80 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -2834,8 +2834,6 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel) mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL); add_column_to_pathtarget(target, (Expr *) aggref, 0); - - result->agg_exprs = lappend(result->agg_exprs, aggref); } /* diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index fd10498028..4ce70f256d 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -1123,9 +1123,6 @@ typedef struct RelOptInfo * "group_clauses", "group_exprs" and "group_pathkeys" are lists of * SortGroupClause, the corresponding grouping expressions and PathKey * respectively. - * - * "agg_exprs" is a list of Aggref nodes for the aggregation of the relation's - * paths. */ typedef struct RelAggInfo { @@ -1161,9 +1158,6 @@ typedef struct RelAggInfo List *group_exprs; /* a list of PathKeys */ List *group_pathkeys; - - /* a list of Aggref nodes */ - List *agg_exprs; } RelAggInfo; /* -- 2.43.0 [application/octet-stream] v9-0008-Add-test-cases.patch (71.5K, 9-v9-0008-Add-test-cases.patch) download | inline diff: From 029be0fd9d784bc21677e48375ebf57bab37ae55 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 16:16:15 +0900 Subject: [PATCH v9 08/10] Add test cases --- src/test/regress/expected/eager_aggregate.out | 1293 +++++++++++++++++ src/test/regress/parallel_schedule | 2 +- src/test/regress/sql/eager_aggregate.sql | 192 +++ 3 files changed, 1486 insertions(+), 1 deletion(-) create mode 100644 src/test/regress/expected/eager_aggregate.out create mode 100644 src/test/regress/sql/eager_aggregate.sql diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out new file mode 100644 index 0000000000..7a28287522 --- /dev/null +++ b/src/test/regress/expected/eager_aggregate.out @@ -0,0 +1,1293 @@ +-- +-- EAGER AGGREGATION +-- Test we can push aggregation down below join +-- +-- Enable eager aggregation, which by default is disabled. +SET enable_eager_aggregate TO on; +CREATE TABLE eager_agg_t1 (a int, b int, c double precision); +CREATE TABLE eager_agg_t2 (a int, b int, c double precision); +CREATE TABLE eager_agg_t3 (a int, b int, c double precision); +INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i; +ANALYZE eager_agg_t1; +ANALYZE eager_agg_t2; +ANALYZE eager_agg_t3; +-- +-- Test eager aggregation over base rel +-- +-- Perform scan of a table, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c +(18 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 +(9 rows) + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial GroupAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Sort + Output: t2.c, t2.b + Sort Key: t2.b + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.c, t2.b +(21 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 +(9 rows) + +RESET enable_hashagg; +-- +-- Test eager aggregation over join rel +-- +-- Perform join of tables, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg((t2.c + t3.c)) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg((t2.c + t3.c))) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg((t2.c + t3.c)) + Group Key: t2.b + -> Hash Join + Output: t2.c, t3.c, t2.b + Hash Cond: (t3.a = t2.a) + -> Seq Scan on public.eager_agg_t3 t3 + Output: t3.a, t3.b, t3.c + -> Hash + Output: t2.c, t2.b, t2.a + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.c, t2.b, t2.a +(25 rows) + +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 497 + 2 | 499 + 3 | 501 + 4 | 503 + 5 | 505 + 6 | 507 + 7 | 509 + 8 | 511 + 9 | 513 +(9 rows) + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg((t2.c + t3.c)) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg((t2.c + t3.c))) + -> Partial GroupAggregate + Output: t2.b, PARTIAL avg((t2.c + t3.c)) + Group Key: t2.b + -> Sort + Output: t2.c, t3.c, t2.b + Sort Key: t2.b + -> Hash Join + Output: t2.c, t3.c, t2.b + Hash Cond: (t3.a = t2.a) + -> Seq Scan on public.eager_agg_t3 t3 + Output: t3.a, t3.b, t3.c + -> Hash + Output: t2.c, t2.b, t2.a + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.c, t2.b, t2.a +(28 rows) + +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 497 + 2 | 499 + 3 | 501 + 4 | 503 + 5 | 505 + 6 | 507 + 7 | 509 + 8 | 511 + 9 | 513 +(9 rows) + +RESET enable_hashagg; +-- +-- Test that eager aggregation works for outer join +-- +-- Ensure aggregation can be pushed down to the non-nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Hash Right Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c +(18 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 + | 505 +(10 rows) + +-- Ensure aggregation cannot be pushed down to the nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; + QUERY PLAN +------------------------------------------------------------ + Sort + Output: t2.b, (avg(t2.c)) + Sort Key: t2.b + -> HashAggregate + Output: t2.b, avg(t2.c) + Group Key: t2.b + -> Hash Right Join + Output: t2.b, t2.c + Hash Cond: (t2.b = t1.b) + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c + -> Hash + Output: t1.b + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.b +(15 rows) + +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; + b | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 + | +(10 rows) + +-- +-- Test that eager aggregation works for parallel plans +-- +SET parallel_setup_cost=0; +SET parallel_tuple_cost=0; +SET min_parallel_table_scan_size=0; +SET max_parallel_workers_per_gather=4; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +--------------------------------------------------------------------------------- + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Gather + Output: t1.a, (PARTIAL avg(t2.c)) + Workers Planned: 2 + -> Parallel Hash Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Parallel Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Parallel Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Parallel Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c +(21 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 +(9 rows) + +RESET parallel_setup_cost; +RESET parallel_tuple_cost; +RESET min_parallel_table_scan_size; +RESET max_parallel_workers_per_gather; +DROP TABLE eager_agg_t1; +DROP TABLE eager_agg_t2; +DROP TABLE eager_agg_t3; +-- +-- Test eager aggregation for partitionwise join +-- +-- Enable partitionwise aggregate, which by default is disabled. +SET enable_partitionwise_aggregate TO true; +-- Enable partitionwise join, which by default is disabled. +SET enable_partitionwise_join TO true; +CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30); +CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y); +CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30); +INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i; +INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i; +ANALYZE eager_agg_tab1; +ANALYZE eager_agg_tab2; +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum(t1.y)), (count(*)) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum(t1.y), count(*) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + Hash Cond: (t2.y = t1.x) + -> Seq Scan on public.eager_agg_tab2_p1 t2 + Output: t2.y + -> Hash + Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*) + Group Key: t1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1 + Output: t1.x, t1.y + -> Finalize HashAggregate + Output: t1_1.x, sum(t1_1.y), count(*) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + Hash Cond: (t2_1.y = t1_1.x) + -> Seq Scan on public.eager_agg_tab2_p2 t2_1 + Output: t2_1.y + -> Hash + Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*) + Group Key: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_1 + Output: t1_1.x, t1_1.y + -> Finalize HashAggregate + Output: t1_2.x, sum(t1_2.y), count(*) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + Hash Cond: (t2_2.y = t1_2.x) + -> Seq Scan on public.eager_agg_tab2_p3 t2_2 + Output: t2_2.y + -> Hash + Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*) + Group Key: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_2 + Output: t1_2.x, t1_2.y +(49 rows) + +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; + x | sum | count +----+------+------- + 0 | 500 | 100 + 6 | 1100 | 100 + 12 | 700 | 100 + 18 | 1300 | 100 + 24 | 900 | 100 +(5 rows) + +-- GROUP BY having other matching key +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t2.y, (sum(t1.y)), (count(*)) + Sort Key: t2.y + -> Append + -> Finalize HashAggregate + Output: t2.y, sum(t1.y), count(*) + Group Key: t2.y + -> Hash Join + Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + Hash Cond: (t2.y = t1.x) + -> Seq Scan on public.eager_agg_tab2_p1 t2 + Output: t2.y + -> Hash + Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*) + Group Key: t1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1 + Output: t1.y, t1.x + -> Finalize HashAggregate + Output: t2_1.y, sum(t1_1.y), count(*) + Group Key: t2_1.y + -> Hash Join + Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + Hash Cond: (t2_1.y = t1_1.x) + -> Seq Scan on public.eager_agg_tab2_p2 t2_1 + Output: t2_1.y + -> Hash + Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*) + Group Key: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_1 + Output: t1_1.y, t1_1.x + -> Finalize HashAggregate + Output: t2_2.y, sum(t1_2.y), count(*) + Group Key: t2_2.y + -> Hash Join + Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + Hash Cond: (t2_2.y = t1_2.x) + -> Seq Scan on public.eager_agg_tab2_p3 t2_2 + Output: t2_2.y + -> Hash + Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*) + Group Key: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_2 + Output: t1_2.y, t1_2.x +(49 rows) + +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; + y | sum | count +----+------+------- + 0 | 500 | 100 + 6 | 1100 | 100 + 12 | 700 | 100 + 18 | 1300 | 100 + 24 | 900 | 100 +(5 rows) + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; + QUERY PLAN +------------------------------------------------------------------------------------------------------------ + Sort + Output: t2.x, (sum(t1.x)), (count(*)) + Sort Key: t2.x + -> Finalize HashAggregate + Output: t2.x, sum(t1.x), count(*) + Group Key: t2.x + Filter: (avg(t1.x) > '10'::numeric) + -> Append + -> Hash Join + Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x)) + Hash Cond: (t2_1.y = t1_1.x) + -> Seq Scan on public.eager_agg_tab2_p1 t2_1 + Output: t2_1.x, t2_1.y + -> Hash + Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x)) + -> Partial HashAggregate + Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x) + Group Key: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1_1 + Output: t1_1.x + -> Hash Join + Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x)) + Hash Cond: (t2_2.y = t1_2.x) + -> Seq Scan on public.eager_agg_tab2_p2 t2_2 + Output: t2_2.x, t2_2.y + -> Hash + Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x)) + -> Partial HashAggregate + Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x) + Group Key: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_2 + Output: t1_2.x + -> Hash Join + Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x)) + Hash Cond: (t2_3.y = t1_3.x) + -> Seq Scan on public.eager_agg_tab2_p3 t2_3 + Output: t2_3.x, t2_3.y + -> Hash + Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x)) + -> Partial HashAggregate + Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x) + Group Key: t1_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_3 + Output: t1_3.x +(44 rows) + +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; + x | sum | count +----+------+------- + 2 | 600 | 50 + 4 | 1200 | 50 + 8 | 900 | 50 + 12 | 600 | 50 + 14 | 1200 | 50 + 18 | 900 | 50 +(6 rows) + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +------------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum((t2.y + t3.y))) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum((t2.y + t3.y)) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum((t2.y + t3.y))) + Hash Cond: (t1.x = t2.x) + -> Seq Scan on public.eager_agg_tab1_p1 t1 + Output: t1.x + -> Hash + Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))) + -> Partial HashAggregate + Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)) + Group Key: t2.x + -> Hash Join + Output: t2.y, t2.x, t3.y, t3.x + Hash Cond: (t2.x = t3.x) + -> Seq Scan on public.eager_agg_tab1_p1 t2 + Output: t2.y, t2.x + -> Hash + Output: t3.y, t3.x + -> Seq Scan on public.eager_agg_tab1_p1 t3 + Output: t3.y, t3.x + -> Finalize HashAggregate + Output: t1_1.x, sum((t2_1.y + t3_1.y)) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab1_p2 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))) + -> Partial HashAggregate + Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)) + Group Key: t2_1.x + -> Hash Join + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab1_p2 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab1_p2 t3_1 + Output: t3_1.y, t3_1.x + -> Finalize HashAggregate + Output: t1_2.x, sum((t2_2.y + t3_2.y)) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab1_p3 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))) + -> Partial HashAggregate + Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)) + Group Key: t2_2.x + -> Hash Join + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab1_p3 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab1_p3 t3_2 + Output: t3_2.y, t3_2.x +(70 rows) + +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + x | sum +----+------- + 0 | 10000 + 2 | 14000 + 4 | 18000 + 6 | 22000 + 8 | 26000 + 10 | 10000 + 12 | 14000 + 14 | 18000 + 16 | 22000 + 18 | 26000 + 20 | 10000 + 22 | 14000 + 24 | 18000 + 26 | 22000 + 28 | 26000 +(15 rows) + +-- partial aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + QUERY PLAN +------------------------------------------------------------------------------------------- + Finalize GroupAggregate + Output: t3.y, sum((t2.y + t3.y)) + Group Key: t3.y + -> Sort + Output: t3.y, (PARTIAL sum((t2.y + t3.y))) + Sort Key: t3.y + -> Append + -> Hash Join + Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))) + Hash Cond: (t2_1.x = t1_1.x) + -> Partial GroupAggregate + Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)) + Group Key: t3_1.y, t2_1.x, t3_1.x + -> Sort + Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x + Sort Key: t3_1.y, t2_1.x + -> Hash Join + Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab1_p1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t3_1 + Output: t3_1.y, t3_1.x + -> Hash + Output: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1_1 + Output: t1_1.x + -> Hash Join + Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))) + Hash Cond: (t2_2.x = t1_2.x) + -> Partial GroupAggregate + Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)) + Group Key: t3_2.y, t2_2.x, t3_2.x + -> Sort + Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x + Sort Key: t3_2.y, t2_2.x + -> Hash Join + Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab1_p2 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t3_2 + Output: t3_2.y, t3_2.x + -> Hash + Output: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_2 + Output: t1_2.x + -> Hash Join + Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))) + Hash Cond: (t2_3.x = t1_3.x) + -> Partial GroupAggregate + Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)) + Group Key: t3_3.y, t2_3.x, t3_3.x + -> Sort + Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x + Sort Key: t3_3.y, t2_3.x + -> Hash Join + Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x + Hash Cond: (t2_3.x = t3_3.x) + -> Seq Scan on public.eager_agg_tab1_p3 t2_3 + Output: t2_3.y, t2_3.x + -> Hash + Output: t3_3.y, t3_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t3_3 + Output: t3_3.y, t3_3.x + -> Hash + Output: t1_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_3 + Output: t1_3.x +(73 rows) + +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + y | sum +----+------- + 0 | 7500 + 2 | 13500 + 4 | 19500 + 6 | 25500 + 8 | 31500 + 10 | 22500 + 12 | 28500 + 14 | 34500 + 16 | 40500 + 18 | 46500 +(10 rows) + +RESET enable_hashagg; +DROP TABLE eager_agg_tab1; +DROP TABLE eager_agg_tab2; +-- +-- Test with multi-level partitioning scheme +-- +CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15); +CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20); +CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25); +CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30); +INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i; +ANALYZE eager_agg_tab_ml; +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum(t2.y)), (count(*)) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum(t2.y), count(*) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*)) + Hash Cond: (t1.x = t2.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1 + Output: t1.x + -> Hash + Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*) + Group Key: t2.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t2 + Output: t2.y, t2.x + -> Finalize HashAggregate + Output: t1_1.x, sum(t2_1.y), count(*) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*) + Group Key: t2_1.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1 + Output: t2_1.y, t2_1.x + -> Finalize HashAggregate + Output: t1_2.x, sum(t2_2.y), count(*) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*) + Group Key: t2_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2 + Output: t2_2.y, t2_2.x + -> Finalize HashAggregate + Output: t1_3.x, sum(t2_3.y), count(*) + Group Key: t1_3.x + -> Hash Join + Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3 + Output: t1_3.x + -> Hash + Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*) + Group Key: t2_3.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3 + Output: t2_3.y, t2_3.x + -> Finalize HashAggregate + Output: t1_4.x, sum(t2_4.y), count(*) + Group Key: t1_4.x + -> Hash Join + Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4 + Output: t1_4.x + -> Hash + Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*) + Group Key: t2_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4 + Output: t2_4.y, t2_4.x +(79 rows) + +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; + x | sum | count +----+-------+------- + 0 | 0 | 1089 + 1 | 1156 | 1156 + 2 | 2312 | 1156 + 3 | 3468 | 1156 + 4 | 4624 | 1156 + 5 | 5780 | 1156 + 6 | 6936 | 1156 + 7 | 8092 | 1156 + 8 | 9248 | 1156 + 9 | 10404 | 1156 + 10 | 11560 | 1156 + 11 | 11979 | 1089 + 12 | 13068 | 1089 + 13 | 14157 | 1089 + 14 | 15246 | 1089 + 15 | 16335 | 1089 + 16 | 17424 | 1089 + 17 | 18513 | 1089 + 18 | 19602 | 1089 + 19 | 20691 | 1089 + 20 | 21780 | 1089 + 21 | 22869 | 1089 + 22 | 23958 | 1089 + 23 | 25047 | 1089 + 24 | 26136 | 1089 + 25 | 27225 | 1089 + 26 | 28314 | 1089 + 27 | 29403 | 1089 + 28 | 30492 | 1089 + 29 | 31581 | 1089 +(30 rows) + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t1.y, (sum(t2.y)), (count(*)) + Sort Key: t1.y + -> Finalize HashAggregate + Output: t1.y, sum(t2.y), count(*) + Group Key: t1.y + -> Append + -> Hash Join + Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1 + Output: t1_1.y, t1_1.x + -> Hash + Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*) + Group Key: t2_1.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash Join + Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2 + Output: t1_2.y, t1_2.x + -> Hash + Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*) + Group Key: t2_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2 + Output: t2_2.y, t2_2.x + -> Hash Join + Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3 + Output: t1_3.y, t1_3.x + -> Hash + Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*) + Group Key: t2_3.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3 + Output: t2_3.y, t2_3.x + -> Hash Join + Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4 + Output: t1_4.y, t1_4.x + -> Hash + Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*) + Group Key: t2_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4 + Output: t2_4.y, t2_4.x + -> Hash Join + Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*)) + Hash Cond: (t1_5.x = t2_5.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5 + Output: t1_5.y, t1_5.x + -> Hash + Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*) + Group Key: t2_5.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5 + Output: t2_5.y, t2_5.x +(67 rows) + +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; + y | sum | count +----+-------+------- + 0 | 0 | 1089 + 1 | 1156 | 1156 + 2 | 2312 | 1156 + 3 | 3468 | 1156 + 4 | 4624 | 1156 + 5 | 5780 | 1156 + 6 | 6936 | 1156 + 7 | 8092 | 1156 + 8 | 9248 | 1156 + 9 | 10404 | 1156 + 10 | 11560 | 1156 + 11 | 11979 | 1089 + 12 | 13068 | 1089 + 13 | 14157 | 1089 + 14 | 15246 | 1089 + 15 | 16335 | 1089 + 16 | 17424 | 1089 + 17 | 18513 | 1089 + 18 | 19602 | 1089 + 19 | 20691 | 1089 + 20 | 21780 | 1089 + 21 | 22869 | 1089 + 22 | 23958 | 1089 + 23 | 25047 | 1089 + 24 | 26136 | 1089 + 25 | 27225 | 1089 + 26 | 28314 | 1089 + 27 | 29403 | 1089 + 28 | 30492 | 1089 + 29 | 31581 | 1089 +(30 rows) + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +---------------------------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum((t2.y + t3.y))), (count(*)) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum((t2.y + t3.y)), count(*) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*)) + Hash Cond: (t1.x = t2.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1 + Output: t1.x + -> Hash + Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*) + Group Key: t2.x + -> Hash Join + Output: t2.y, t2.x, t3.y, t3.x + Hash Cond: (t2.x = t3.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t2 + Output: t2.y, t2.x + -> Hash + Output: t3.y, t3.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t3 + Output: t3.y, t3.x + -> Finalize HashAggregate + Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*) + Group Key: t2_1.x + -> Hash Join + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1 + Output: t3_1.y, t3_1.x + -> Finalize HashAggregate + Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*) + Group Key: t2_2.x + -> Hash Join + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2 + Output: t3_2.y, t3_2.x + -> Finalize HashAggregate + Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*) + Group Key: t1_3.x + -> Hash Join + Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3 + Output: t1_3.x + -> Hash + Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*) + Group Key: t2_3.x + -> Hash Join + Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x + Hash Cond: (t2_3.x = t3_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3 + Output: t2_3.y, t2_3.x + -> Hash + Output: t3_3.y, t3_3.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3 + Output: t3_3.y, t3_3.x + -> Finalize HashAggregate + Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*) + Group Key: t1_4.x + -> Hash Join + Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4 + Output: t1_4.x + -> Hash + Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*) + Group Key: t2_4.x + -> Hash Join + Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x + Hash Cond: (t2_4.x = t3_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4 + Output: t2_4.y, t2_4.x + -> Hash + Output: t3_4.y, t3_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4 + Output: t3_4.y, t3_4.x +(114 rows) + +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + x | sum | count +----+---------+------- + 0 | 0 | 35937 + 1 | 78608 | 39304 + 2 | 157216 | 39304 + 3 | 235824 | 39304 + 4 | 314432 | 39304 + 5 | 393040 | 39304 + 6 | 471648 | 39304 + 7 | 550256 | 39304 + 8 | 628864 | 39304 + 9 | 707472 | 39304 + 10 | 786080 | 39304 + 11 | 790614 | 35937 + 12 | 862488 | 35937 + 13 | 934362 | 35937 + 14 | 1006236 | 35937 + 15 | 1078110 | 35937 + 16 | 1149984 | 35937 + 17 | 1221858 | 35937 + 18 | 1293732 | 35937 + 19 | 1365606 | 35937 + 20 | 1437480 | 35937 + 21 | 1509354 | 35937 + 22 | 1581228 | 35937 + 23 | 1653102 | 35937 + 24 | 1724976 | 35937 + 25 | 1796850 | 35937 + 26 | 1868724 | 35937 + 27 | 1940598 | 35937 + 28 | 2012472 | 35937 + 29 | 2084346 | 35937 +(30 rows) + +-- partial aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + QUERY PLAN +------------------------------------------------------------------------------------------------------------------ + Sort + Output: t3.y, (sum((t2.y + t3.y))), (count(*)) + Sort Key: t3.y + -> Finalize HashAggregate + Output: t3.y, sum((t2.y + t3.y)), count(*) + Group Key: t3.y + -> Append + -> Hash Join + Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1 + Output: t1_1.x + -> Hash + Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*) + Group Key: t3_1.y, t2_1.x, t3_1.x + -> Hash Join + Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1 + Output: t3_1.y, t3_1.x + -> Hash Join + Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2 + Output: t1_2.x + -> Hash + Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*) + Group Key: t3_2.y, t2_2.x, t3_2.x + -> Hash Join + Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2 + Output: t3_2.y, t3_2.x + -> Hash Join + Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3 + Output: t1_3.x + -> Hash + Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*) + Group Key: t3_3.y, t2_3.x, t3_3.x + -> Hash Join + Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x + Hash Cond: (t2_3.x = t3_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3 + Output: t2_3.y, t2_3.x + -> Hash + Output: t3_3.y, t3_3.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3 + Output: t3_3.y, t3_3.x + -> Hash Join + Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4 + Output: t1_4.x + -> Hash + Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*) + Group Key: t3_4.y, t2_4.x, t3_4.x + -> Hash Join + Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x + Hash Cond: (t2_4.x = t3_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4 + Output: t2_4.y, t2_4.x + -> Hash + Output: t3_4.y, t3_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4 + Output: t3_4.y, t3_4.x + -> Hash Join + Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*)) + Hash Cond: (t1_5.x = t2_5.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5 + Output: t1_5.x + -> Hash + Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*) + Group Key: t3_5.y, t2_5.x, t3_5.x + -> Hash Join + Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x + Hash Cond: (t2_5.x = t3_5.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5 + Output: t2_5.y, t2_5.x + -> Hash + Output: t3_5.y, t3_5.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5 + Output: t3_5.y, t3_5.x +(102 rows) + +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + y | sum | count +----+---------+------- + 0 | 0 | 35937 + 1 | 78608 | 39304 + 2 | 157216 | 39304 + 3 | 235824 | 39304 + 4 | 314432 | 39304 + 5 | 393040 | 39304 + 6 | 471648 | 39304 + 7 | 550256 | 39304 + 8 | 628864 | 39304 + 9 | 707472 | 39304 + 10 | 786080 | 39304 + 11 | 790614 | 35937 + 12 | 862488 | 35937 + 13 | 934362 | 35937 + 14 | 1006236 | 35937 + 15 | 1078110 | 35937 + 16 | 1149984 | 35937 + 17 | 1221858 | 35937 + 18 | 1293732 | 35937 + 19 | 1365606 | 35937 + 20 | 1437480 | 35937 + 21 | 1509354 | 35937 + 22 | 1581228 | 35937 + 23 | 1653102 | 35937 + 24 | 1724976 | 35937 + 25 | 1796850 | 35937 + 26 | 1868724 | 35937 + 27 | 1940598 | 35937 + 28 | 2012472 | 35937 + 29 | 2084346 | 35937 +(30 rows) + +DROP TABLE eager_agg_tab_ml; diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 2429ec2bba..d5697e5655 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr # The stats test resets stats, so nothing else needing stats access can be in # this group. # ---------- -test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate +test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate # event_trigger depends on create_am and cannot run concurrently with # any test that runs DDL diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql new file mode 100644 index 0000000000..4050e4df44 --- /dev/null +++ b/src/test/regress/sql/eager_aggregate.sql @@ -0,0 +1,192 @@ +-- +-- EAGER AGGREGATION +-- Test we can push aggregation down below join +-- + +-- Enable eager aggregation, which by default is disabled. +SET enable_eager_aggregate TO on; + +CREATE TABLE eager_agg_t1 (a int, b int, c double precision); +CREATE TABLE eager_agg_t2 (a int, b int, c double precision); +CREATE TABLE eager_agg_t3 (a int, b int, c double precision); + +INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i; + +ANALYZE eager_agg_t1; +ANALYZE eager_agg_t2; +ANALYZE eager_agg_t3; + + +-- +-- Test eager aggregation over base rel +-- + +-- Perform scan of a table, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +RESET enable_hashagg; + + +-- +-- Test eager aggregation over join rel +-- + +-- Perform join of tables, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + +RESET enable_hashagg; + + +-- +-- Test that eager aggregation works for outer join +-- + +-- Ensure aggregation can be pushed down to the non-nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +-- Ensure aggregation cannot be pushed down to the nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; + + +-- +-- Test that eager aggregation works for parallel plans +-- + +SET parallel_setup_cost=0; +SET parallel_tuple_cost=0; +SET min_parallel_table_scan_size=0; +SET max_parallel_workers_per_gather=4; + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +RESET parallel_setup_cost; +RESET parallel_tuple_cost; +RESET min_parallel_table_scan_size; +RESET max_parallel_workers_per_gather; + + +DROP TABLE eager_agg_t1; +DROP TABLE eager_agg_t2; +DROP TABLE eager_agg_t3; + + +-- +-- Test eager aggregation for partitionwise join +-- + +-- Enable partitionwise aggregate, which by default is disabled. +SET enable_partitionwise_aggregate TO true; +-- Enable partitionwise join, which by default is disabled. +SET enable_partitionwise_join TO true; + +CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30); +CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y); +CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30); +INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i; +INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i; + +ANALYZE eager_agg_tab1; +ANALYZE eager_agg_tab2; + +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; + +-- GROUP BY having other matching key +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + +-- partial aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; +RESET enable_hashagg; + +DROP TABLE eager_agg_tab1; +DROP TABLE eager_agg_tab2; + + +-- +-- Test with multi-level partitioning scheme +-- +CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15); +CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20); +CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25); +CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30); +INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i; + +ANALYZE eager_agg_tab_ml; + +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + +-- partial aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + +DROP TABLE eager_agg_tab_ml; -- 2.43.0 [application/octet-stream] v9-0009-Add-README.patch (4.8K, 10-v9-0009-Add-README.patch) download | inline diff: From 03382212e277487b28108b394a0a41e386db732d Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 16:17:27 +0900 Subject: [PATCH v9 09/10] Add README --- src/backend/optimizer/README | 88 ++++++++++++++++++++++++++++++++++++ 1 file changed, 88 insertions(+) diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README index 2ab4f3dbf3..dae7b87f32 100644 --- a/src/backend/optimizer/README +++ b/src/backend/optimizer/README @@ -1497,3 +1497,91 @@ breaking down aggregation or grouping over a partitioned relation into aggregation or grouping over its partitions is called partitionwise aggregation. Especially when the partition keys match the GROUP BY clause, this can be significantly faster than the regular method. + +Eager aggregation +------------------- + +The obvious way to evaluate aggregates is to evaluate the FROM clause of the +SQL query (this is what query_planner does) and use the resulting paths as the +input of Agg node. However, if the groups are large enough, it may be more +efficient to apply the partial aggregation to the output of base relation +scan, and finalize it when we have all relations of the query joined: + + EXPLAIN (COSTS OFF) + SELECT a.i, avg(b.y) + FROM a JOIN b ON a.i = b.j + GROUP BY a.i; + + Finalize HashAggregate + Group Key: a.i + -> Nested Loop + -> Partial HashAggregate + Group Key: b.j + -> Seq Scan on b + -> Index Only Scan using a_pkey on a + Index Cond: (i = b.j) + +Thus the join above the partial aggregate node receives fewer input rows, and +so the number of outer-to-inner pairs of tuples to be checked can be +significantly lower, which can in turn lead to considerably lower join cost. + +Note that the GROUP BY expression might not be useful for the partial +aggregate. In the example above, the aggregate avg(b.y) references table "b", +but the GROUP BY expression mentions "a". However, the equivalence class {a.i, +b.j} allows us to use the b.j column as a grouping key for the partial +aggregation of the "b" table. The equivalence class mechanism is suitable +because it's designed to derive join clauses, and at the same time the join +clauses determine the choice of grouping columns of the partial aggregate: the +only way for the partial aggregate to provide upper join(s) with input values +is to have the join input expression(s) in the grouping key; besides grouping +columns, the partial aggregate can only produce the transient states of the +aggregate functions, but aggregate functions cannot be referenced by the JOIN +clauses. + +Regarding correctness, join node considers the output of the partial aggregate +to be equivalent to the output of a plain (non-aggregated) relation scan. That +is, a group (i.e. a row of the partial aggregate output) matches the other +side of the join if and only if each row of the non-aggregate relation +does. In other words, all rows belonging to the same group have the same value +of the join columns (As mentioned above, a join cannot reference other output +expressions of the partial aggregate than the grouping expressions.). + +However, there's a restriction from the aggregate's perspective: the aggregate +cannot be pushed down if any column referenced by either grouping expression +or aggregate function can be set to NULL by an outer join above the relation +to which we want to apply the partial aggregation. The point is that those +NULL values would not appear on the input of the pushed-down, so it could +either put the rows into groups in a different way than the aggregate at the +top of the plan, or it could compute wrong values of the aggregate functions. + +Besides base relation, the aggregation can also be pushed down to join: + + EXPLAIN (COSTS OFF) + SELECT a.i, avg(b.y + c.z) + FROM a JOIN b ON a.i = b.j + JOIN c ON b.j = c.i + GROUP BY a.i; + + Finalize HashAggregate + Group Key: a.i + -> Nested Loop + -> Partial HashAggregate + Group Key: b.j + -> Hash Join + Hash Cond: (b.j = c.i) + -> Seq Scan on b + -> Hash + -> Seq Scan on c + -> Index Only Scan using a_pkey on a + Index Cond: (i = b.j) + +Whether the Agg node is created out of base relation or out of join, it's +added to a separate RelOptInfo that we call "grouped relation". Grouped +relation can be joined to a non-grouped relation, which results in a grouped +relation too. Join of two grouped relations does not seem to be very useful +and is currently not supported. + +If query_planner produces a grouped relation that contains valid paths, these +are simply added to the UPPERREL_PARTIAL_GROUP_AGG relation. Further +processing of these paths then does not differ from processing of other +partially grouped paths. -- 2.43.0 [application/octet-stream] v9-0010-Run-pgindent.patch (25.1K, 11-v9-0010-Run-pgindent.patch) download | inline diff: From 3f807e0601224654085f947bab5d4def3bde4fdf Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Wed, 3 Jul 2024 16:24:39 +0900 Subject: [PATCH v9 10/10] Run pgindent --- src/backend/optimizer/geqo/geqo_eval.c | 19 ++++--- src/backend/optimizer/path/allpaths.c | 74 ++++++++++++------------- src/backend/optimizer/path/joinrels.c | 20 ++++--- src/backend/optimizer/plan/initsplan.c | 24 ++++---- src/backend/optimizer/plan/planner.c | 8 +-- src/backend/optimizer/util/appendinfo.c | 2 +- src/backend/optimizer/util/relnode.c | 69 +++++++++++------------ src/include/nodes/pathnodes.h | 10 ++-- src/tools/pgindent/typedefs.list | 4 ++ 9 files changed, 119 insertions(+), 111 deletions(-) diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c index 278857d767..2851bed282 100644 --- a/src/backend/optimizer/geqo/geqo_eval.c +++ b/src/backend/optimizer/geqo/geqo_eval.c @@ -87,8 +87,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) * already contain some entries. The newly added entries will be recycled * by the MemoryContextDelete below, so we must ensure that each list of * the RelInfoList structures is restored to its former state before - * exiting. We can do this by truncating each list to its original length. - * NOTE this assumes that any added entries are appended at the end! + * exiting. We can do this by truncating each list to its original + * length. NOTE this assumes that any added entries are appended at the + * end! * * We also must take care not to mess up the outer hash tables of the * RelInfoList structures, if any. We can do this by just temporarily @@ -136,8 +137,8 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) /* * Restore each of the list in join_rel_list, agg_info_list and - * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put back - * original hashtable if any. + * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put + * back original hashtable if any. */ root->join_rel_list->items = list_truncate(root->join_rel_list->items, savelength_join_rel); @@ -308,14 +309,14 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene, /* * Except for the topmost scan/join rel, consider generating - * partial aggregation paths for the grouped relation on top of the - * paths of this rel. After that, we're done creating paths for - * the grouped relation, so run set_cheapest(). + * partial aggregation paths for the grouped relation on top + * of the paths of this rel. After that, we're done creating + * paths for the grouped relation, so run set_cheapest(). */ if (!bms_equal(joinrel->relids, root->all_query_rels)) { - RelOptInfo *rel_grouped; - RelAggInfo *agg_info; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; rel_grouped = find_grouped_rel(root, joinrel->relids, &agg_info); diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 359eee3486..3602dcacfa 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -360,19 +360,19 @@ setup_base_grouped_rels(PlannerInfo *root) for (rti = 1; rti < root->simple_rel_array_size; rti++) { - RelOptInfo *rel = root->simple_rel_array[rti]; - RelOptInfo *rel_grouped; - RelAggInfo *agg_info; + RelOptInfo *rel = root->simple_rel_array[rti]; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; /* there may be empty slots corresponding to non-baserel RTEs */ if (rel == NULL) continue; - Assert(rel->relid == rti); /* sanity check on array */ + Assert(rel->relid == rti); /* sanity check on array */ /* - * Ignore RTEs that are not simple rels. Note that we need to consider - * "other rels" here. + * Ignore RTEs that are not simple rels. Note that we need to + * consider "other rels" here. */ if (!IS_SIMPLE_REL(rel)) continue; @@ -1366,8 +1366,8 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel) { - RelOptInfo *rel_grouped; - RelAggInfo *agg_info; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; /* Add paths to the grouped base relation if one exists. */ rel_grouped = find_grouped_rel(root, rel->relids, @@ -3419,8 +3419,8 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs); /* - * Determine whether it's possible to perform sort-based implementations of - * grouping. + * Determine whether it's possible to perform sort-based implementations + * of grouping. */ can_sort = grouping_is_sortable(agg_info->group_clauses); @@ -3481,9 +3481,9 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, int presorted_keys; /* - * Since the path originates from the non-grouped relation which is - * not aware of eager aggregation, we must ensure that it provides - * the correct input for the partial aggregation. + * Since the path originates from the non-grouped relation which + * is not aware of eager aggregation, we must ensure that it + * provides the correct input for the partial aggregation. */ path = (Path *) create_projection_path(root, rel_grouped, @@ -3527,8 +3527,8 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, } /* - * qual is NIL because the HAVING clause cannot be evaluated until the - * final value of the aggregate is known. + * qual is NIL because the HAVING clause cannot be evaluated until + * the final value of the aggregate is known. */ path = (Path *) create_agg_path(root, rel_grouped, @@ -3558,9 +3558,9 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, int presorted_keys; /* - * Since the path originates from the non-grouped relation which is - * not aware of eager aggregation, we must ensure that it provides - * the correct input for the partial aggregation. + * Since the path originates from the non-grouped relation which + * is not aware of eager aggregation, we must ensure that it + * provides the correct input for the partial aggregation. */ path = (Path *) create_projection_path(root, rel_grouped, @@ -3605,8 +3605,8 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, } /* - * qual is NIL because the HAVING clause cannot be evaluated until the - * final value of the aggregate is known. + * qual is NIL because the HAVING clause cannot be evaluated until + * the final value of the aggregate is known. */ path = (Path *) create_agg_path(root, rel_grouped, @@ -3628,12 +3628,12 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, */ if (can_hash && cheapest_total_path != NULL) { - Path *path; + Path *path; /* * Since the path originates from the non-grouped relation which is - * not aware of eager aggregation, we must ensure that it provides - * the correct input for the partial aggregation. + * not aware of eager aggregation, we must ensure that it provides the + * correct input for the partial aggregation. */ path = (Path *) create_projection_path(root, rel_grouped, @@ -3641,8 +3641,8 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, agg_info->agg_input); /* - * qual is NIL because the HAVING clause cannot be evaluated until - * the final value of the aggregate is known. + * qual is NIL because the HAVING clause cannot be evaluated until the + * final value of the aggregate is known. */ path = (Path *) create_agg_path(root, rel_grouped, @@ -3663,12 +3663,12 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, */ if (can_hash && cheapest_partial_path != NULL) { - Path *path; + Path *path; /* * Since the path originates from the non-grouped relation which is - * not aware of eager aggregation, we must ensure that it provides - * the correct input for the partial aggregation. + * not aware of eager aggregation, we must ensure that it provides the + * correct input for the partial aggregation. */ path = (Path *) create_projection_path(root, rel_grouped, @@ -3676,8 +3676,8 @@ generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, agg_info->agg_input); /* - * qual is NIL because the HAVING clause cannot be evaluated until - * the final value of the aggregate is known. + * qual is NIL because the HAVING clause cannot be evaluated until the + * final value of the aggregate is known. */ path = (Path *) create_agg_path(root, rel_grouped, @@ -3880,14 +3880,14 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) /* * Except for the topmost scan/join rel, consider generating - * partial aggregation paths for the grouped relation on top of the - * paths of this rel. After that, we're done creating paths for - * the grouped relation, so run set_cheapest(). + * partial aggregation paths for the grouped relation on top of + * the paths of this rel. After that, we're done creating paths + * for the grouped relation, so run set_cheapest(). */ if (!bms_equal(rel->relids, root->all_query_rels)) { - RelOptInfo *rel_grouped; - RelAggInfo *agg_info; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; rel_grouped = find_grouped_rel(root, rel->relids, &agg_info); @@ -4777,8 +4777,8 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel) rel->top_parent_relids : rel->relids, root->all_query_rels)) { - RelOptInfo *rel_grouped; - RelAggInfo *agg_info; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; rel_grouped = find_grouped_rel(root, child_rel->relids, &agg_info); diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index 78a88c9d3b..23bbef15f0 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -905,12 +905,12 @@ make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, RelOptInfo *joinrel, SpecialJoinInfo *sjinfo, List *restrictlist) { - RelOptInfo *rel_grouped; - RelAggInfo *agg_info = NULL; - RelOptInfo *rel1_grouped; - RelOptInfo *rel2_grouped; - bool rel1_empty; - bool rel2_empty; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info = NULL; + RelOptInfo *rel1_grouped; + RelOptInfo *rel2_grouped; + bool rel1_empty; + bool rel2_empty; /* * If there are no aggregate expressions or grouping expressions, eager @@ -975,9 +975,9 @@ make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1, /* * Join of two grouped relations is currently not supported. In such a * case, grouping of one side would change the occurrence of the other - * side's aggregate transient states on the input of the final aggregation. - * This can be handled by adjusting the transient states, but it's not - * worth the effort for now. + * side's aggregate transient states on the input of the final + * aggregation. This can be handled by adjusting the transient states, but + * it's not worth the effort for now. */ if (!rel1_empty && !rel2_empty) return; @@ -989,6 +989,7 @@ make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1, sjinfo, restrictlist); populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped, sjinfo, restrictlist); + /* * It shouldn't happen that we have marked rel1_grouped as dummy in * populate_joinrel_with_paths due to provably constant-false join @@ -1003,6 +1004,7 @@ make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1, sjinfo, restrictlist); populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped, sjinfo, restrictlist); + /* * It shouldn't happen that we have marked rel2_grouped as dummy in * populate_joinrel_with_paths due to provably constant-false join diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 9f05edfbac..f093ef0f13 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -430,9 +430,9 @@ create_agg_clause_infos(PlannerInfo *root) } /* - * Aggregates within the HAVING clause need to be processed in the same way - * as those in the targetlist. Note that HAVING can contain Aggrefs but - * not WindowFuncs. + * Aggregates within the HAVING clause need to be processed in the same + * way as those in the targetlist. Note that HAVING can contain Aggrefs + * but not WindowFuncs. */ if (root->parse->havingQual != NULL) { @@ -504,10 +504,10 @@ create_grouping_expr_infos(PlannerInfo *root) SortGroupClause *sgc = lfirst_node(SortGroupClause, lc); TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist); TypeCacheEntry *tce; - Oid equalimageproc; - Oid eq_op; - List *eq_opfamilies; - Oid btree_opfamily; + Oid equalimageproc; + Oid eq_op; + List *eq_opfamilies; + Oid btree_opfamily; Assert(tle->ressortgroupref > 0); @@ -518,11 +518,11 @@ create_grouping_expr_infos(PlannerInfo *root) return; /* - * Eager aggregation is only possible if equality of grouping keys - * per the equality operator implies bitwise equality. Otherwise, if - * we put keys of different byte images into the same group, we lose - * some information that may be needed to evaluate join clauses above - * the pushed-down aggregate node, or the WHERE clause. + * Eager aggregation is only possible if equality of grouping keys per + * the equality operator implies bitwise equality. Otherwise, if we + * put keys of different byte images into the same group, we lose some + * information that may be needed to evaluate join clauses above the + * pushed-down aggregate node, or the WHERE clause. * * For example, the NUMERIC data type is not supported because values * that fall into the same group according to the equality operator diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index b69efb3cd1..72a45f1b01 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -4084,8 +4084,8 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, /* * Now choose the best path(s) for partially_grouped_rel. * - * Note that the non-partial paths can come either from the Gather above or - * from eager aggregation. + * Note that the non-partial paths can come either from the Gather above + * or from eager aggregation. */ if (partially_grouped_rel && partially_grouped_rel->pathlist) set_cheapest(partially_grouped_rel); @@ -7245,8 +7245,8 @@ create_partial_grouping_paths(PlannerInfo *root, /* * It is possible that the partially_grouped_rel created by eager - * aggregation is dummy. In this case we just set it to NULL. It might be - * created again by the following logic if possible. + * aggregation is dummy. In this case we just set it to NULL. It might + * be created again by the following logic if possible. */ if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel)) partially_grouped_rel = NULL; diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index 08de77d439..27ac853c0a 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -542,7 +542,7 @@ adjust_appendrel_attrs_mutator(Node *node, if (oldtarget->sortgrouprefs) { - Size nbytes = list_length(oldtarget->exprs) * sizeof(Index); + Size nbytes = list_length(oldtarget->exprs) * sizeof(Index); newtarget->exprs = (List *) adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs, diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 91013e1a80..55ff082b9b 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -47,7 +47,7 @@ typedef struct RelInfoEntry { Relids relids; /* hash key --- MUST BE FIRST */ void *data; -} RelInfoEntry; +} RelInfoEntry; static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *input_rel, @@ -435,13 +435,13 @@ RelOptInfo * build_simple_grouped_rel(PlannerInfo *root, int relid, RelAggInfo **agg_info_p) { - RelOptInfo *rel_plain; - RelOptInfo *rel_grouped; - RelAggInfo *agg_info; + RelOptInfo *rel_plain; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; /* - * We should have available aggregate expressions and grouping expressions, - * otherwise we cannot reach here. + * We should have available aggregate expressions and grouping + * expressions, otherwise we cannot reach here. */ Assert(root->agg_clause_list != NIL); Assert(root->group_expr_list != NIL); @@ -481,7 +481,7 @@ build_simple_grouped_rel(PlannerInfo *root, int relid, RelOptInfo * build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain) { - RelOptInfo *rel_grouped; + RelOptInfo *rel_grouped; rel_grouped = makeNode(RelOptInfo); memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo)); @@ -2672,13 +2672,13 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel) /* * If this is a child rel, the grouped rel for its parent rel must have - * been created if it can. So we can just use parent's RelAggInfo if there - * is one, with appropriate variable substitutions. + * been created if it can. So we can just use parent's RelAggInfo if + * there is one, with appropriate variable substitutions. */ if (IS_OTHER_REL(rel)) { - RelOptInfo *rel_grouped; - RelAggInfo *agg_info; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; Assert(!bms_is_empty(rel->top_parent_relids)); rel_grouped = find_grouped_rel(root, rel->top_parent_relids, &agg_info); @@ -2761,8 +2761,8 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel) * Initialize the SortGroupClause. * * As the final aggregation will not use this grouping expression, - * we don't care whether sortop is < or >. The value of nulls_first - * should not matter for the same reason. + * we don't care whether sortop is < or >. The value of + * nulls_first should not matter for the same reason. */ cl->tleSortGroupRef = ++sortgroupref; get_sort_group_operators(var->vartype, @@ -2826,7 +2826,7 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel) foreach(lc, root->agg_clause_list) { AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); - Aggref *aggref; + Aggref *aggref; Assert(IsA(ac_info->aggref, Aggref)); @@ -2848,8 +2848,8 @@ create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel) result->agg_input = agg_input; /* - * The number of aggregation input rows is simply the number of rows of the - * non-grouped relation, which should have been estimated by now. + * The number of aggregation input rows is simply the number of rows of + * the non-grouped relation, which should have been estimated by now. */ result->input_rows = rel->rows; @@ -2920,7 +2920,8 @@ eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel) Assert(IsA(ac_info->aggref, Aggref)); /* - * Give up if any aggregate needs relations other than the current one. + * Give up if any aggregate needs relations other than the current + * one. * * If the aggregate needs the current rel plus anything else, then the * problem is that grouping of the current relation could make some @@ -3012,7 +3013,7 @@ init_grouping_targets(PlannerInfo *root, RelOptInfo *rel, } else { - bool safe_to_push; + bool safe_to_push; if (is_var_needed_by_join(root, (Var *) expr, rel, &safe_to_push)) { @@ -3024,17 +3025,17 @@ init_grouping_targets(PlannerInfo *root, RelOptInfo *rel, return false; /* - * The expression is needed for a join, however it's neither in - * the GROUP BY clause nor can it be derived from it using EC. - * (Otherwise it would have already been added to the targets - * above.) We need to construct a special SortGroupClause for - * this expression. + * The expression is needed for a join, however it's neither + * in the GROUP BY clause nor can it be derived from it using + * EC. (Otherwise it would have already been added to the + * targets above.) We need to construct a special + * SortGroupClause for this expression. * * Note that its tleSortGroupRef needs to be unique within * agg_input, so we need to postpone creation of this * SortGroupClause until we're done with the iteration of - * rel->reltarget->exprs. And it makes sense for the caller to - * do some more checks before it starts to create those + * rel->reltarget->exprs. And it makes sense for the caller + * to do some more checks before it starts to create those * SortGroupClauses. */ *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr); @@ -3045,9 +3046,9 @@ init_grouping_targets(PlannerInfo *root, RelOptInfo *rel, * Another reason we might need this variable is that some * aggregate pushed down to this relation references it. In * such a case, add it to "agg_input", but not to "target". - * However, if the aggregate is not the only reason for the var - * to be in the target, some more checks need to be performed - * below. + * However, if the aggregate is not the only reason for the + * var to be in the target, some more checks need to be + * performed below. */ add_new_column_to_pathtarget(agg_input, expr); } @@ -3101,8 +3102,8 @@ init_grouping_targets(PlannerInfo *root, RelOptInfo *rel, * expression but not referenced by any join. * * If the eager aggregation will support generic grouping - * expression in the future, create_rel_agg_info() will have to add - * this variable to "agg_input" target and also add the whole + * expression in the future, create_rel_agg_info() will have to + * add this variable to "agg_input" target and also add the whole * generic expression to "target". */ return false; @@ -3128,7 +3129,7 @@ is_var_in_aggref_only(PlannerInfo *root, Var *var) foreach(lc, root->agg_clause_list) { AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); - List *vars; + List *vars; Assert(IsA(ac_info->aggref, Aggref)); @@ -3188,9 +3189,9 @@ is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel, attno = var->varattno - baserel->min_attr; /* - * If the baserel this Var belongs to can be nulled by outer joins that are - * above the current rel, then it is not safe to use this Var as a grouping - * key at current rel level. + * If the baserel this Var belongs to can be nulled by outer joins that + * are above the current rel, then it is not safe to use this Var as a + * grouping key at current rel level. */ *safe_to_push = bms_is_subset(baserel->nulling_relids, rel->relids); diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 4ce70f256d..e32d96769c 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -291,7 +291,7 @@ struct PlannerInfo * join_rel_list is a list of all join-relation RelOptInfos we have * considered in this planning run. */ - RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */ + RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */ /* * When doing a dynamic-programming-style join search, join_rel_level[k] @@ -435,7 +435,7 @@ struct PlannerInfo * Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular * upper rel. */ - RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); + RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); /* * list of grouped relation RelAggInfos. One instance of RelAggInfo per @@ -1147,10 +1147,10 @@ typedef struct RelAggInfo struct PathTarget *agg_input; /* estimated number of input tuples for the grouped paths */ - Cardinality input_rows; + Cardinality input_rows; - /* estimated number of result tuples of the grouped relation*/ - Cardinality grouped_rows; + /* estimated number of result tuples of the grouped relation */ + Cardinality grouped_rows; /* a list of SortGroupClause's */ List *group_clauses; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index e6c1caf649..4019a5fee9 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -41,6 +41,7 @@ AfterTriggersTableData AfterTriggersTransData Agg AggClauseCosts +AggClauseInfo AggInfo AggPath AggSplit @@ -1058,6 +1059,7 @@ GrantTargetType Group GroupByOrdering GroupClause +GroupExprInfo GroupPath GroupPathExtraData GroupResultPath @@ -2364,12 +2366,14 @@ ReindexObjectType ReindexParams ReindexStmt ReindexType +RelAggInfo RelFileLocator RelFileLocatorBackend RelFileNumber RelIdCacheEnt RelInfo RelInfoArr +RelInfoList RelMapFile RelMapping RelOptInfo -- 2.43.0 ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-07-07 02:45 Paul George <[email protected]> parent: Richard Guo <[email protected]> 0 siblings, 1 reply; 30+ messages in thread From: Paul George @ 2024-07-07 02:45 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Andy Fan <[email protected]>; pgsql-hackers; [email protected] Richard: Thanks for reviving this patch and for all of your work on it! Eager aggregation pushdown will be beneficial for my work and I'm hoping to see it land. I was playing around with v9 of the patches and was specifically curious about this previous statement... >This patch also makes eager aggregation work with outer joins. With >outer join, the aggregate cannot be pushed down if any column referenced >by grouping expressions or aggregate functions is nullable by an outer >join above the relation to which we want to apply the partiall >aggregation. Thanks to Tom's outer-join-aware-Var infrastructure, we >can easily identify such situations and subsequently refrain from >pushing down the aggregates. ...and this related comment in eager_aggregate.out: >-- Ensure aggregation cannot be pushed down to the nullable side While I'm new to this work and its subtleties, I'm wondering if this is too broad a condition. I modified the first test query in eager_aggregate.sql to make it a LEFT JOIN and eager aggregation indeed did not happen, which is expected based on the comments upthread. query: SET enable_eager_aggregate=ON; EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.a, sum(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; plan: QUERY PLAN ------------------------------------------------------------ GroupAggregate Output: t1.a, sum(t2.c) Group Key: t1.a -> Sort Output: t1.a, t2.c Sort Key: t1.a -> Hash Right Join Output: t1.a, t2.c Hash Cond: (t2.b = t1.b) -> Seq Scan on public.eager_agg_t2 t2 Output: t2.a, t2.b, t2.c -> Hash Output: t1.a, t1.b -> Seq Scan on public.eager_agg_t1 t1 Output: t1.a, t1.b (15 rows) (NOTE: I changed the aggregate from avg(...) to sum(...) for simplicity) But, it seems that eager aggregation for the query above can be "replicated" as: query: EXPLAIN (VERBOSE, COSTS OFF) SELECT t1.a, sum(t2.c) FROM eager_agg_t1 t1 LEFT JOIN ( SELECT b, sum(c) c FROM eager_agg_t2 t2p GROUP BY b ) t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; The output of both the original query and this one match (and the plans with eager aggregation and the subquery are nearly identical if you restore the LEFT JOIN to a JOIN). I admittedly may be missing a subtlety, but does this mean that there are conditions under which eager aggregation can be pushed down to the nullable side? -Paul- On Sat, Jul 6, 2024 at 4:56 PM Richard Guo <[email protected]> wrote: > On Thu, Jun 13, 2024 at 4:07 PM Richard Guo <[email protected]> > wrote: > > I spent some time testing this patchset and found a few more issues. > > ... > > > Hence here is the v8 patchset, with fixes for all the above issues. > > I found an 'ORDER/GROUP BY expression not found in targetlist' error > with this patchset, with the query below: > > create table t (a boolean); > > set enable_eager_aggregate to on; > > explain (costs off) > select min(1) from t t1 left join t t2 on t1.a group by (not (not > t1.a)), t1.a order by t1.a; > ERROR: ORDER/GROUP BY expression not found in targetlist > > This happens because the two grouping items are actually the same and > standard_qp_callback would remove one of them. The fully-processed > groupClause is kept in root->processed_groupClause. However, when > collecting grouping expressions in create_grouping_expr_infos, we are > checking parse->groupClause, which is incorrect. > > The fix is straightforward: check root->processed_groupClause instead. > > Here is a new rebase with this fix. > > Thanks > Richard > ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-07-10 08:27 Richard Guo <[email protected]> parent: Paul George <[email protected]> 0 siblings, 1 reply; 30+ messages in thread From: Richard Guo @ 2024-07-10 08:27 UTC (permalink / raw) To: Paul George <[email protected]>; +Cc: Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Sun, Jul 7, 2024 at 10:45 AM Paul George <[email protected]> wrote: > Thanks for reviving this patch and for all of your work on it! Eager aggregation pushdown will be beneficial for my work and I'm hoping to see it land. Thanks for looking at this patch! > The output of both the original query and this one match (and the plans with eager aggregation and the subquery are nearly identical if you restore the LEFT JOIN to a JOIN). I admittedly may be missing a subtlety, but does this mean that there are conditions under which eager aggregation can be pushed down to the nullable side? I think it's a very risky thing to push a partial aggregation down to the nullable side of an outer join, because the NULL-extended rows produced by the outer join would not be available when we perform the partial aggregation, while with a non-eager-aggregation plan these rows are available for the top-level aggregation. This may put the rows into groups in a different way than expected, or get wrong values from the aggregate functions. I've managed to compose an example: create table t (a int, b int); insert into t select 1, 1; select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by t2.a having t2.a is null; a | count ---+------- | 1 (1 row) This is the expected result, because after the outer join we have got a NULL-extended row. But if we somehow push down the partial aggregation to the nullable side of this outer join, we would get a wrong result. explain (costs off) select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by t2.a having t2.a is null; QUERY PLAN ------------------------------------------- Finalize HashAggregate Group Key: t2.a -> Nested Loop Left Join Filter: (t2.a IS NULL) -> Seq Scan on t t1 -> Materialize -> Partial HashAggregate Group Key: t2.a -> Seq Scan on t t2 Filter: (b > 1) (10 rows) select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by t2.a having t2.a is null; a | count ---+------- | 0 (1 row) I believe there are cases where pushing a partial aggregation down to the nullable side of an outer join can be safe, but I doubt that there is an easy way to identify these cases and do the push-down for them. So for now I think we'd better refrain from doing that. Thanks Richard ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-07-11 21:50 Paul George <[email protected]> parent: Richard Guo <[email protected]> 0 siblings, 1 reply; 30+ messages in thread From: Paul George @ 2024-07-11 21:50 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Andy Fan <[email protected]>; pgsql-hackers; [email protected] Hey Richard, Looking more closely at this example >select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by t2.a having t2.a is null; I wonder if the inability to exploit eager aggregation is more based on the fact that COUNT(*) cannot be decomposed into an aggregation of PARTIAL COUNT(*)s (apologies if my terminology is off/made up...I'm new to the codebase). In other words, is it the case that a given aggregate function already has built-in protection against the error case you correctly pointed out? To highlight this, in the simple example below we don't see aggregate pushdown even with an INNER JOIN when the agg function is COUNT(*) but we do when it's COUNT(t2.*): -- same setup drop table if exists t; create table t(a int, b int, c int); insert into t select i % 100, i % 10, i from generate_series(1, 1000) i; analyze t; -- query 1: COUNT(*) --> no pushdown set enable_eager_aggregate=on; explain (verbose, costs off) select t1.a, count(*) from t t1 join t t2 on t1.a=t2.a group by t1.a; QUERY PLAN ------------------------------------------- HashAggregate Output: t1.a, count(*) Group Key: t1.a -> Hash Join Output: t1.a Hash Cond: (t1.a = t2.a) -> Seq Scan on public.t t1 Output: t1.a, t1.b, t1.c -> Hash Output: t2.a -> Seq Scan on public.t t2 Output: t2.a (12 rows) -- query 2: COUNT(t2.*) --> agg pushdown set enable_eager_aggregate=on; explain (verbose, costs off) select t1.a, count(t2.*) from t t1 join t t2 on t1.a=t2.a group by t1.a; QUERY PLAN ------------------------------------------------------- Finalize HashAggregate Output: t1.a, count(t2.*) Group Key: t1.a -> Hash Join Output: t1.a, (PARTIAL count(t2.*)) Hash Cond: (t1.a = t2.a) -> Seq Scan on public.t t1 Output: t1.a, t1.b, t1.c -> Hash Output: t2.a, (PARTIAL count(t2.*)) -> Partial HashAggregate Output: t2.a, PARTIAL count(t2.*) Group Key: t2.a -> Seq Scan on public.t t2 Output: t2.*, t2.a (15 rows) ...while it might be true that COUNT(*) ... INNER JOIN should allow eager agg pushdown (I haven't thought deeply about it, TBH), I did find this result pretty interesting. -Paul On Wed, Jul 10, 2024 at 1:27 AM Richard Guo <[email protected]> wrote: > On Sun, Jul 7, 2024 at 10:45 AM Paul George <[email protected]> > wrote: > > Thanks for reviving this patch and for all of your work on it! Eager > aggregation pushdown will be beneficial for my work and I'm hoping to see > it land. > > Thanks for looking at this patch! > > > The output of both the original query and this one match (and the plans > with eager aggregation and the subquery are nearly identical if you restore > the LEFT JOIN to a JOIN). I admittedly may be missing a subtlety, but does > this mean that there are conditions under which eager aggregation can be > pushed down to the nullable side? > > I think it's a very risky thing to push a partial aggregation down to > the nullable side of an outer join, because the NULL-extended rows > produced by the outer join would not be available when we perform the > partial aggregation, while with a non-eager-aggregation plan these > rows are available for the top-level aggregation. This may put the > rows into groups in a different way than expected, or get wrong values > from the aggregate functions. I've managed to compose an example: > > create table t (a int, b int); > insert into t select 1, 1; > > select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by > t2.a having t2.a is null; > a | count > ---+------- > | 1 > (1 row) > > This is the expected result, because after the outer join we have got > a NULL-extended row. > > But if we somehow push down the partial aggregation to the nullable > side of this outer join, we would get a wrong result. > > explain (costs off) > select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by > t2.a having t2.a is null; > QUERY PLAN > ------------------------------------------- > Finalize HashAggregate > Group Key: t2.a > -> Nested Loop Left Join > Filter: (t2.a IS NULL) > -> Seq Scan on t t1 > -> Materialize > -> Partial HashAggregate > Group Key: t2.a > -> Seq Scan on t t2 > Filter: (b > 1) > (10 rows) > > select t2.a, count(*) from t t1 left join t t2 on t2.b > 1 group by > t2.a having t2.a is null; > a | count > ---+------- > | 0 > (1 row) > > I believe there are cases where pushing a partial aggregation down to > the nullable side of an outer join can be safe, but I doubt that there > is an easy way to identify these cases and do the push-down for them. > So for now I think we'd better refrain from doing that. > > Thanks > Richard > ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-08-16 08:14 Richard Guo <[email protected]> parent: Paul George <[email protected]> 0 siblings, 1 reply; 30+ messages in thread From: Richard Guo @ 2024-08-16 08:14 UTC (permalink / raw) To: Paul George <[email protected]>; +Cc: Andy Fan <[email protected]>; pgsql-hackers; [email protected] I had a self-review of this patchset and made some refactoring, especially to the function that creates the RelAggInfo structure for a given relation. While there were no major changes, the code should now be simpler. Attached is the updated version of the patchset. Previously, the patchset was not well-split, which made it time-consuming to distribute the changes across the patches during the refactoring. So I squashed them into two patches to save effort. Thanks Richard Attachments: [application/octet-stream] v10-0001-Introduce-RelInfoList-structure.patch (14.8K, 2-v10-0001-Introduce-RelInfoList-structure.patch) download | inline diff: From 4ef89693cc376a8d16b40bf403712b5ad171471c Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 15:59:19 +0900 Subject: [PATCH v10 1/2] Introduce RelInfoList structure This commit introduces the RelInfoList structure, which encapsulates both a list and a hash table, so that we can leverage the hash table for faster lookups not only for join relations but also for upper relations. --- contrib/postgres_fdw/postgres_fdw.c | 3 +- src/backend/optimizer/geqo/geqo_eval.c | 20 +-- src/backend/optimizer/path/allpaths.c | 7 +- src/backend/optimizer/plan/planmain.c | 5 +- src/backend/optimizer/util/relnode.c | 164 ++++++++++++++----------- src/include/nodes/pathnodes.h | 32 +++-- src/tools/pgindent/typedefs.list | 3 +- 7 files changed, 136 insertions(+), 98 deletions(-) diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index fc65d81e21..be4038f64f 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -6079,7 +6079,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype, */ Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */ fpinfo->relation_index = - list_length(root->parse->rtable) + list_length(root->join_rel_list); + list_length(root->parse->rtable) + + list_length(root->join_rel_list->items); return true; } diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c index d2f7f4e5f3..1141156899 100644 --- a/src/backend/optimizer/geqo/geqo_eval.c +++ b/src/backend/optimizer/geqo/geqo_eval.c @@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) * truncating the list to its original length. NOTE this assumes that any * added entries are appended at the end! * - * We also must take care not to mess up the outer join_rel_hash, if there - * is one. We can do this by just temporarily setting the link to NULL. - * (If we are dealing with enough join rels, which we very likely are, a - * new hash table will get built and used locally.) + * We also must take care not to mess up the outer join_rel_list->hash, if + * there is one. We can do this by just temporarily setting the link to + * NULL. (If we are dealing with enough join rels, which we very likely + * are, a new hash table will get built and used locally.) * * join_rel_level[] shouldn't be in use, so just Assert it isn't. */ - savelength = list_length(root->join_rel_list); - savehash = root->join_rel_hash; + savelength = list_length(root->join_rel_list->items); + savehash = root->join_rel_list->hash; Assert(root->join_rel_level == NULL); - root->join_rel_hash = NULL; + root->join_rel_list->hash = NULL; /* construct the best path for the given combination of relations */ joinrel = gimme_tree(root, tour, num_gene); @@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) * Restore join_rel_list to its former state, and put back original * hashtable if any. */ - root->join_rel_list = list_truncate(root->join_rel_list, - savelength); - root->join_rel_hash = savehash; + root->join_rel_list->items = list_truncate(root->join_rel_list->items, + savelength); + root->join_rel_list->hash = savehash; /* release all the memory acquired within gimme_tree */ MemoryContextSwitchTo(oldcxt); diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 057b4b79eb..b550e707a4 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -3410,9 +3410,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist) * needed for these paths need have been instantiated. * * Note to plugin authors: the functions invoked during standard_join_search() - * modify root->join_rel_list and root->join_rel_hash. If you want to do more - * than one join-order search, you'll probably need to save and restore the - * original states of those data structures. See geqo_eval() for an example. + * modify root->join_rel_list->items and root->join_rel_list->hash. If you + * want to do more than one join-order search, you'll probably need to save and + * restore the original states of those data structures. See geqo_eval() for + * an example. */ RelOptInfo * standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index e17d31a5c3..fd8b2b0ca3 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -64,8 +64,9 @@ query_planner(PlannerInfo *root, * NOTE: append_rel_list was set up by subquery_planner, so do not touch * here. */ - root->join_rel_list = NIL; - root->join_rel_hash = NULL; + root->join_rel_list = makeNode(RelInfoList); + root->join_rel_list->items = NIL; + root->join_rel_list->hash = NULL; root->join_rel_level = NULL; root->join_cur_level = 0; root->canon_pathkeys = NIL; diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index d7266e4cdb..76e13971f7 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -35,11 +35,15 @@ #include "utils/lsyscache.h" -typedef struct JoinHashEntry +/* + * An entry of a hash table that we use to make lookup for RelOptInfo + * structures more efficient. + */ +typedef struct RelInfoEntry { - Relids join_relids; /* hash key --- MUST BE FIRST */ - RelOptInfo *join_rel; -} JoinHashEntry; + Relids relids; /* hash key --- MUST BE FIRST */ + RelOptInfo *rel; +} RelInfoEntry; static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *input_rel, @@ -479,11 +483,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid) } /* - * build_join_rel_hash - * Construct the auxiliary hash table for join relations. + * build_rel_hash + * Construct the auxiliary hash table for relations. */ static void -build_join_rel_hash(PlannerInfo *root) +build_rel_hash(RelInfoList *list) { HTAB *hashtab; HASHCTL hash_ctl; @@ -491,47 +495,49 @@ build_join_rel_hash(PlannerInfo *root) /* Create the hash table */ hash_ctl.keysize = sizeof(Relids); - hash_ctl.entrysize = sizeof(JoinHashEntry); + hash_ctl.entrysize = sizeof(RelInfoEntry); hash_ctl.hash = bitmap_hash; hash_ctl.match = bitmap_match; hash_ctl.hcxt = CurrentMemoryContext; - hashtab = hash_create("JoinRelHashTable", + hashtab = hash_create("RelHashTable", 256L, &hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); - /* Insert all the already-existing joinrels */ - foreach(l, root->join_rel_list) + /* Insert all the already-existing relations */ + foreach(l, list->items) { RelOptInfo *rel = (RelOptInfo *) lfirst(l); - JoinHashEntry *hentry; + RelInfoEntry *hentry; bool found; - hentry = (JoinHashEntry *) hash_search(hashtab, - &(rel->relids), - HASH_ENTER, - &found); + hentry = (RelInfoEntry *) hash_search(hashtab, + &(rel->relids), + HASH_ENTER, + &found); Assert(!found); - hentry->join_rel = rel; + hentry->rel = rel; } - root->join_rel_hash = hashtab; + list->hash = hashtab; } /* - * find_join_rel - * Returns relation entry corresponding to 'relids' (a set of RT indexes), - * or NULL if none exists. This is for join relations. + * find_rel_info + * Find an RelOptInfo entry. */ -RelOptInfo * -find_join_rel(PlannerInfo *root, Relids relids) +static RelOptInfo * +find_rel_info(RelInfoList *list, Relids relids) { + if (list == NULL) + return NULL; + /* * Switch to using hash lookup when list grows "too long". The threshold * is arbitrary and is known only here. */ - if (!root->join_rel_hash && list_length(root->join_rel_list) > 32) - build_join_rel_hash(root); + if (!list->hash && list_length(list->items) > 32) + build_rel_hash(list); /* * Use either hashtable lookup or linear search, as appropriate. @@ -541,23 +547,23 @@ find_join_rel(PlannerInfo *root, Relids relids) * so would force relids out of a register and thus probably slow down the * list-search case. */ - if (root->join_rel_hash) + if (list->hash) { Relids hashkey = relids; - JoinHashEntry *hentry; + RelInfoEntry *hentry; - hentry = (JoinHashEntry *) hash_search(root->join_rel_hash, - &hashkey, - HASH_FIND, - NULL); + hentry = (RelInfoEntry *) hash_search(list->hash, + &hashkey, + HASH_FIND, + NULL); if (hentry) - return hentry->join_rel; + return hentry->rel; } else { ListCell *l; - foreach(l, root->join_rel_list) + foreach(l, list->items) { RelOptInfo *rel = (RelOptInfo *) lfirst(l); @@ -569,6 +575,54 @@ find_join_rel(PlannerInfo *root, Relids relids) return NULL; } +/* + * find_join_rel + * Returns relation entry corresponding to 'relids' (a set of RT indexes), + * or NULL if none exists. This is for join relations. + */ +RelOptInfo * +find_join_rel(PlannerInfo *root, Relids relids) +{ + return find_rel_info(root->join_rel_list, relids); +} + +/* + * add_rel_info + * Add given relation to the given list. Also add it to the auxiliary + * hashtable if there is one. + */ +static void +add_rel_info(RelInfoList *list, RelOptInfo *rel) +{ + /* GEQO requires us to append the new relation to the end of the list! */ + list->items = lappend(list->items, rel); + + /* store it into the auxiliary hashtable if there is one. */ + if (list->hash) + { + RelInfoEntry *hentry; + bool found; + + hentry = (RelInfoEntry *) hash_search(list->hash, + &(rel->relids), + HASH_ENTER, + &found); + Assert(!found); + hentry->rel = rel; + } +} + +/* + * add_join_rel + * Add given join relation to the list of join relations in the given + * PlannerInfo. + */ +static void +add_join_rel(PlannerInfo *root, RelOptInfo *joinrel) +{ + add_rel_info(root->join_rel_list, joinrel); +} + /* * set_foreign_rel_properties * Set up foreign-join fields if outer and inner relation are foreign @@ -618,32 +672,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel, } } -/* - * add_join_rel - * Add given join relation to the list of join relations in the given - * PlannerInfo. Also add it to the auxiliary hashtable if there is one. - */ -static void -add_join_rel(PlannerInfo *root, RelOptInfo *joinrel) -{ - /* GEQO requires us to append the new joinrel to the end of the list! */ - root->join_rel_list = lappend(root->join_rel_list, joinrel); - - /* store it into the auxiliary hashtable if there is one. */ - if (root->join_rel_hash) - { - JoinHashEntry *hentry; - bool found; - - hentry = (JoinHashEntry *) hash_search(root->join_rel_hash, - &(joinrel->relids), - HASH_ENTER, - &found); - Assert(!found); - hentry->join_rel = joinrel; - } -} - /* * build_join_rel * Returns relation entry corresponding to the union of two given rels, @@ -1457,22 +1485,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel, RelOptInfo * fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids) { + RelInfoList *list = &root->upper_rels[kind]; RelOptInfo *upperrel; - ListCell *lc; - - /* - * For the moment, our indexing data structure is just a List for each - * relation kind. If we ever get so many of one kind that this stops - * working well, we can improve it. No code outside this function should - * assume anything about how to find a particular upperrel. - */ /* If we already made this upperrel for the query, return it */ - foreach(lc, root->upper_rels[kind]) + if (list) { - upperrel = (RelOptInfo *) lfirst(lc); - - if (bms_equal(upperrel->relids, relids)) + upperrel = find_rel_info(list, relids); + if (upperrel) return upperrel; } @@ -1491,7 +1511,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids) upperrel->cheapest_unique_path = NULL; upperrel->cheapest_parameterized_paths = NIL; - root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel); + add_rel_info(&root->upper_rels[kind], upperrel); return upperrel; } diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 14ccfc1ac1..1951ae7c11 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -80,6 +80,26 @@ typedef enum UpperRelationKind /* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */ } UpperRelationKind; +/* + * A structure consisting of a list and a hash table to store relation-specific + * information. + * + * For small problems we just scan the list to do lookups, but when there are + * many relations we build a hash table for faster lookups. The hash table is + * present and valid when 'hash' is not NULL. Note that we still maintain the + * list even when using the hash table for lookups; this simplifies life for + * GEQO. + */ +typedef struct RelInfoList +{ + pg_node_attr(no_copy_equal, no_read) + + NodeTag type; + + List *items; + struct HTAB *hash pg_node_attr(read_write_ignore); +} RelInfoList; + /*---------- * PlannerGlobal * Global information for planning/optimization @@ -270,15 +290,9 @@ struct PlannerInfo /* * join_rel_list is a list of all join-relation RelOptInfos we have - * considered in this planning run. For small problems we just scan the - * list to do lookups, but when there are many join relations we build a - * hash table for faster lookups. The hash table is present and valid - * when join_rel_hash is not NULL. Note that we still maintain the list - * even when using the hash table for lookups; this simplifies life for - * GEQO. + * considered in this planning run. */ - List *join_rel_list; - struct HTAB *join_rel_hash pg_node_attr(read_write_ignore); + RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */ /* * When doing a dynamic-programming-style join search, join_rel_level[k] @@ -413,7 +427,7 @@ struct PlannerInfo * Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular * upper rel. */ - List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); + RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); /* Result tlists chosen by grouping_planner for upper-stage processing */ struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 547d14b3e7..5255160212 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -1291,7 +1291,6 @@ Join JoinCostWorkspace JoinDomain JoinExpr -JoinHashEntry JoinPath JoinPathExtraData JoinState @@ -2377,6 +2376,8 @@ RelFileNumber RelIdCacheEnt RelInfo RelInfoArr +RelInfoEntry +RelInfoList RelMapFile RelMapping RelOptInfo -- 2.43.0 [application/octet-stream] v10-0002-Implement-Eager-Aggregation.patch (162.0K, 3-v10-0002-Implement-Eager-Aggregation.patch) download | inline diff: From aaac4e4c4bcd1d259a95ca7b99288fefa3dd832d Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 16:01:26 +0900 Subject: [PATCH v10 2/2] Implement Eager Aggregation Eager aggregation is a query optimization technique that partially pushes aggregation past a join, and finalizes it once all the relations are joined. Eager aggregation may reduce the number of input rows to the join and thus could result in a better overall plan. A plan with eager aggregation looks like: EXPLAIN (COSTS OFF) SELECT a.i, avg(b.y) FROM a JOIN b ON a.i = b.j GROUP BY a.i; Finalize HashAggregate Group Key: a.i -> Nested Loop -> Partial HashAggregate Group Key: b.j -> Seq Scan on b -> Index Only Scan using a_pkey on a Index Cond: (i = b.j) During the construction of the join tree, we evaluate each base or join relation to determine if eager aggregation can be applied. If feasible, we create a separate RelOptInfo called a "grouped relation" and store it in root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG]. Grouped relation paths can be generated in two ways. The first method involves adding sorted and hashed partial aggregation paths on top of the non-grouped paths. To limit planning time, we only consider the cheapest or suitably-sorted non-grouped paths during this phase. Alternatively, grouped paths can be generated by joining a grouped relation with a non-grouped relation. Joining two grouped relations does not seem to be very useful and is currently not supported. For the partial aggregation that is pushed down to a non-aggregated relation, we need to consider all expressions from this relation that are involved in upper join clauses and include them in the grouping keys. This ensures that we have the correct input for the upper joins and that an aggregated row from the partial aggregation matches the other side of the join if and only if each row in the partial group does, which is crucial for maintaining correctness. One restriction is that we cannot push partial aggregation down to a relation that is in the nullable side of an outer join, because the NULL-extended rows produced by the outer join would not be available when we perform the partial aggregation, while with a non-eager-aggregation plan these rows are available for the top-level aggregation. Pushing partial aggregation in this case may result in the rows being grouped differently than expected, or produce incorrect values from the aggregate functions. If we have generated a grouped relation for the topmost join relation, we finalize its paths at the end. The final path will compete in the usual way with paths built from regular planning. Since eager aggregation can generate many upper relations of partial aggregation, we introduce a RelInfoList structure, which encapsulates both a list and a hash table, so that we can leverage the hash table for faster lookups not only for join relations but also for upper relations. Eager aggregation can use significantly more CPU time and memory than regular planning when the query involves aggregates and many joining relations. However, in some cases, the resulting plan can be much better, justifying the additional planning effort. All the same, for now, turn this feature off by default. --- src/backend/optimizer/README | 79 + src/backend/optimizer/geqo/geqo_eval.c | 104 +- src/backend/optimizer/path/allpaths.c | 441 ++++++ src/backend/optimizer/path/joinrels.c | 135 ++ src/backend/optimizer/plan/initsplan.c | 252 ++++ src/backend/optimizer/plan/planmain.c | 12 + src/backend/optimizer/plan/planner.c | 100 +- src/backend/optimizer/util/appendinfo.c | 60 + src/backend/optimizer/util/pathnode.c | 12 +- src/backend/optimizer/util/relnode.c | 770 +++++++++- src/backend/utils/misc/guc_tables.c | 10 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/nodes/pathnodes.h | 100 ++ src/include/optimizer/pathnode.h | 9 + src/include/optimizer/paths.h | 5 + src/include/optimizer/planmain.h | 1 + src/test/regress/expected/eager_aggregate.out | 1293 +++++++++++++++++ src/test/regress/expected/sysviews.out | 3 +- src/test/regress/parallel_schedule | 2 +- src/test/regress/sql/eager_aggregate.sql | 192 +++ src/tools/pgindent/typedefs.list | 4 + 21 files changed, 3488 insertions(+), 97 deletions(-) create mode 100644 src/test/regress/expected/eager_aggregate.out create mode 100644 src/test/regress/sql/eager_aggregate.sql diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README index 2ab4f3dbf3..6f79ef531e 100644 --- a/src/backend/optimizer/README +++ b/src/backend/optimizer/README @@ -1497,3 +1497,82 @@ breaking down aggregation or grouping over a partitioned relation into aggregation or grouping over its partitions is called partitionwise aggregation. Especially when the partition keys match the GROUP BY clause, this can be significantly faster than the regular method. + +Eager aggregation +----------------- + +Eager aggregation is a query optimization technique that partially pushes +aggregation past a join, and finalizes it once all the relations are joined. +Eager aggregation may reduce the number of input rows to the join and thus +could result in a better overall plan. + +For example: + + EXPLAIN (COSTS OFF) + SELECT a.i, avg(b.y) + FROM a JOIN b ON a.i = b.j + GROUP BY a.i; + + Finalize HashAggregate + Group Key: a.i + -> Nested Loop + -> Partial HashAggregate + Group Key: b.j + -> Seq Scan on b + -> Index Only Scan using a_pkey on a + Index Cond: (i = b.j) + +If the partial aggregation on table B significantly reduces the number of +input rows, the join above will be much cheaper, leading to a more efficient +final plan. + +For the partial aggregation that is pushed down to a non-aggregated relation, +we need to consider all expressions from this relation that are involved in +upper join clauses and include them in the grouping keys. This ensures that we +have the correct input for the upper joins and that an aggregated row from the +partial aggregation matches the other side of the join if and only if each row +in the partial group does, which is crucial for maintaining correctness. + +One restriction is that we cannot push partial aggregation down to a relation +that is in the nullable side of an outer join, because the NULL-extended rows +produced by the outer join would not be available when we perform the partial +aggregation, while with a non-eager-aggregation plan these rows are available +for the top-level aggregation. Pushing partial aggregation in this case may +result in the rows being grouped differently than expected, or produce +incorrect values from the aggregate functions. + +We can also apply eager aggregation to a join: + + EXPLAIN (COSTS OFF) + SELECT a.i, avg(b.y + c.z) + FROM a JOIN b ON a.i = b.j + JOIN c ON b.j = c.i + GROUP BY a.i; + + Finalize HashAggregate + Group Key: a.i + -> Nested Loop + -> Partial HashAggregate + Group Key: b.j + -> Hash Join + Hash Cond: (b.j = c.i) + -> Seq Scan on b + -> Hash + -> Seq Scan on c + -> Index Only Scan using a_pkey on a + Index Cond: (i = b.j) + +During the construction of the join tree, we evaluate each base or join +relation to determine if eager aggregation can be applied. If feasible, we +create a separate RelOptInfo called a "grouped relation" and generate grouped +paths by adding sorted and hashed partial aggregation paths on top of the +non-grouped paths. To limit planning time, we consider only the cheapest +non-grouped paths in this step. + +Another way to generate grouped paths is to join a grouped relation with a +non-grouped relation. Joining two grouped relations does not seem to be very +useful and is currently not supported. + +If we have generated a grouped relation for the topmost join relation, we need +to finalize its paths at the end. The final path will compete in the usual way +with paths built from regular planning. diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c index 1141156899..b77805d27d 100644 --- a/src/backend/optimizer/geqo/geqo_eval.c +++ b/src/backend/optimizer/geqo/geqo_eval.c @@ -39,10 +39,20 @@ typedef struct int size; /* number of input relations in clump */ } Clump; +/* The original length and hashtable of a RelInfoList */ +typedef struct +{ + int savelength; + struct HTAB *savehash; +} RelInfoListInfo; + static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene, bool force); static bool desirable_join(PlannerInfo *root, RelOptInfo *outer_rel, RelOptInfo *inner_rel); +static RelInfoListInfo save_relinfolist(RelInfoList *relinfo_list); +static void restore_relinfolist(RelInfoList *relinfo_list, + RelInfoListInfo *info); /* @@ -60,8 +70,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) MemoryContext oldcxt; RelOptInfo *joinrel; Cost fitness; - int savelength; - struct HTAB *savehash; + RelInfoListInfo save_join_rel; + RelInfoListInfo save_grouped_rel; + RelInfoListInfo save_grouped_info; /* * Create a private memory context that will hold all temp storage @@ -78,25 +89,33 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) oldcxt = MemoryContextSwitchTo(mycontext); /* - * gimme_tree will add entries to root->join_rel_list, which may or may - * not already contain some entries. The newly added entries will be - * recycled by the MemoryContextDelete below, so we must ensure that the - * list is restored to its former state before exiting. We can do this by - * truncating the list to its original length. NOTE this assumes that any - * added entries are appended at the end! + * gimme_tree will add entries to root->join_rel_list, root->agg_info_list + * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not + * already contain some entries. The newly added entries will be recycled + * by the MemoryContextDelete below, so we must ensure that each list of + * the RelInfoList structures is restored to its former state before + * exiting. We can do this by truncating each list to its original + * length. NOTE this assumes that any added entries are appended at the + * end! * - * We also must take care not to mess up the outer join_rel_list->hash, if - * there is one. We can do this by just temporarily setting the link to - * NULL. (If we are dealing with enough join rels, which we very likely - * are, a new hash table will get built and used locally.) + * We also must take care not to mess up the outer hash tables of the + * RelInfoList structures, if any. We can do this by just temporarily + * setting each link to NULL. (If we are dealing with enough join rels, + * which we very likely are, new hash tables will get built and used + * locally.) * * join_rel_level[] shouldn't be in use, so just Assert it isn't. */ - savelength = list_length(root->join_rel_list->items); - savehash = root->join_rel_list->hash; + save_join_rel = save_relinfolist(root->join_rel_list); + save_grouped_rel = + save_relinfolist(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG]); + save_grouped_info = save_relinfolist(root->agg_info_list); + Assert(root->join_rel_level == NULL); root->join_rel_list->hash = NULL; + root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL; + root->agg_info_list->hash = NULL; /* construct the best path for the given combination of relations */ joinrel = gimme_tree(root, tour, num_gene); @@ -118,12 +137,14 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) fitness = DBL_MAX; /* - * Restore join_rel_list to its former state, and put back original - * hashtable if any. + * Restore each of the list in join_rel_list, agg_info_list and + * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put + * back original hashtable if any. */ - root->join_rel_list->items = list_truncate(root->join_rel_list->items, - savelength); - root->join_rel_list->hash = savehash; + restore_relinfolist(root->join_rel_list, &save_join_rel); + restore_relinfolist(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], + &save_grouped_rel); + restore_relinfolist(root->agg_info_list, &save_grouped_info); /* release all the memory acquired within gimme_tree */ MemoryContextSwitchTo(oldcxt); @@ -279,6 +300,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene, /* Find and save the cheapest paths for this joinrel */ set_cheapest(joinrel); + /* + * Except for the topmost scan/join rel, consider generating + * partial aggregation paths for the grouped relation on top + * of the paths of this rel. After that, we're done creating + * paths for the grouped relation, so run set_cheapest(). + */ + if (!bms_equal(joinrel->relids, root->all_query_rels)) + { + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + rel_grouped = find_grouped_rel(root, joinrel->relids, + &agg_info); + if (rel_grouped) + { + generate_grouped_paths(root, rel_grouped, joinrel, + agg_info); + set_cheapest(rel_grouped); + } + } + /* Absorb new clump into old */ old_clump->joinrel = joinrel; old_clump->size += new_clump->size; @@ -336,3 +378,27 @@ desirable_join(PlannerInfo *root, /* Otherwise postpone the join till later. */ return false; } + +/* + * Save the original length and hashtable of a RelInfoList. + */ +static RelInfoListInfo +save_relinfolist(RelInfoList *relinfo_list) +{ + RelInfoListInfo info; + + info.savelength = list_length(relinfo_list->items); + info.savehash = relinfo_list->hash; + + return info; +} + +/* + * Restore the original length and hashtable of a RelInfoList. + */ +static void +restore_relinfolist(RelInfoList *relinfo_list, RelInfoListInfo *info) +{ + relinfo_list->items = list_truncate(relinfo_list->items, info->savelength); + relinfo_list->hash = info->savehash; +} diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index b550e707a4..03795a0ec4 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -40,6 +40,7 @@ #include "optimizer/paths.h" #include "optimizer/plancat.h" #include "optimizer/planner.h" +#include "optimizer/prep.h" #include "optimizer/tlist.h" #include "parser/parse_clause.h" #include "parser/parsetree.h" @@ -47,6 +48,7 @@ #include "port/pg_bitutils.h" #include "rewrite/rewriteManip.h" #include "utils/lsyscache.h" +#include "utils/selfuncs.h" /* Bitmask flags for pushdown_safety_info.unsafeFlags */ @@ -77,6 +79,7 @@ typedef enum pushdown_safe_type /* These parameters are set by GUC */ bool enable_geqo = false; /* just in case GUC doesn't set it */ +bool enable_eager_aggregate = false; int geqo_threshold; int min_parallel_table_scan_size; int min_parallel_index_scan_size; @@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL; static void set_base_rel_consider_startup(PlannerInfo *root); static void set_base_rel_sizes(PlannerInfo *root); +static void setup_base_grouped_rels(PlannerInfo *root); static void set_base_rel_pathlists(PlannerInfo *root); static void set_rel_size(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); @@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); +static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel); static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel, List *live_childrels, List *all_child_pathkeys); @@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist) */ set_base_rel_sizes(root); + /* + * Build grouped base relations for each base rel if possible. + */ + setup_base_grouped_rels(root); + /* * We should now have size estimates for every actual table involved in * the query, and we also know which if any have been deleted from the @@ -323,6 +333,53 @@ set_base_rel_sizes(PlannerInfo *root) } } +/* + * setup_base_grouped_rels + * For each "plain" base relation, build a grouped base relation if eager + * aggregation is possible and if this relation can produce grouped paths. + */ +static void +setup_base_grouped_rels(PlannerInfo *root) +{ + Index rti; + + /* + * If there are no aggregate expressions or grouping expressions, eager + * aggregation is not possible. + */ + if (root->agg_clause_list == NIL || + root->group_expr_list == NIL) + return; + + /* + * Eager aggregation only makes sense if there are multiple base rels in + * the query. + */ + if (bms_membership(root->all_baserels) != BMS_MULTIPLE) + return; + + for (rti = 1; rti < root->simple_rel_array_size; rti++) + { + RelOptInfo *rel = root->simple_rel_array[rti]; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + /* there may be empty slots corresponding to non-baserel RTEs */ + if (rel == NULL) + continue; + + Assert(rel->relid == rti); /* sanity check on array */ + Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */ + + rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info); + if (rel_grouped) + { + /* Make the grouped relation available for joining. */ + add_grouped_rel(root, rel_grouped, agg_info); + } + } +} + /* * set_base_rel_pathlists * Finds all paths available for scanning each base-relation entry. @@ -559,6 +616,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, /* Now find the cheapest of the paths for this rel */ set_cheapest(rel); + /* + * If a grouped relation for this rel exists, build partial aggregation + * paths for it. + * + * Note that this can only happen after we've called set_cheapest() for + * this base rel, because we need its cheapest paths. + */ + set_grouped_rel_pathlist(root, rel); + #ifdef OPTIMIZER_DEBUG pprint(rel); #endif @@ -1294,6 +1360,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, add_paths_to_append_rel(root, rel, live_childrels); } +/* + * set_grouped_rel_pathlist + * If a grouped relation for the given 'rel' exists, build partial + * aggregation paths for it. + */ +static void +set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel) +{ + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + /* Add paths to the grouped base relation if one exists. */ + rel_grouped = find_grouped_rel(root, rel->relids, + &agg_info); + if (rel_grouped) + { + generate_grouped_paths(root, rel_grouped, rel, + agg_info); + set_cheapest(rel_grouped); + } +} + /* * add_paths_to_append_rel @@ -3302,6 +3390,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r } } +/* + * generate_grouped_paths + * Generate paths for a grouped relation by adding sorted and hashed + * partial aggregation paths on top of paths of the plain base or join + * relation. + * + * The information needed are provided by the RelAggInfo structure. + */ +void +generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, + RelOptInfo *rel_plain, RelAggInfo *agg_info) +{ + AggClauseCosts agg_costs; + bool can_hash; + bool can_sort; + Path *cheapest_total_path = NULL; + Path *cheapest_partial_path = NULL; + double dNumGroups = 0; + double dNumPartialGroups = 0; + + if (IS_DUMMY_REL(rel_plain)) + { + mark_dummy_rel(rel_grouped); + return; + } + + MemSet(&agg_costs, 0, sizeof(AggClauseCosts)); + get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs); + + /* + * Determine whether it's possible to perform sort-based implementations + * of grouping. + */ + can_sort = grouping_is_sortable(agg_info->group_clauses); + + /* + * Determine whether we should consider hash-based implementations of + * grouping. + */ + Assert(root->numOrderedAggs == 0); + can_hash = (agg_info->group_clauses != NIL && + grouping_is_hashable(agg_info->group_clauses)); + + /* + * Consider whether we should generate partially aggregated non-partial + * paths. We can only do this if we have a non-partial path. + */ + if (rel_plain->pathlist != NIL) + { + cheapest_total_path = rel_plain->cheapest_total_path; + Assert(cheapest_total_path != NULL); + } + + /* + * If parallelism is possible for rel_grouped, then we should consider + * generating partially-grouped partial paths. However, if the plain rel + * has no partial paths, then we can't. + */ + if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL) + { + cheapest_partial_path = linitial(rel_plain->partial_pathlist); + Assert(cheapest_partial_path != NULL); + } + + /* Estimate number of partial groups. */ + if (cheapest_total_path != NULL) + dNumGroups = estimate_num_groups(root, + agg_info->group_exprs, + cheapest_total_path->rows, + NULL, NULL); + if (cheapest_partial_path != NULL) + dNumPartialGroups = estimate_num_groups(root, + agg_info->group_exprs, + cheapest_partial_path->rows, + NULL, NULL); + + if (can_sort && cheapest_total_path != NULL) + { + ListCell *lc; + + /* + * Use any available suitably-sorted path as input, and also consider + * sorting the cheapest-total path. + */ + foreach(lc, rel_plain->pathlist) + { + Path *input_path = (Path *) lfirst(lc); + Path *path; + bool is_sorted; + int presorted_keys; + + /* + * Since the path originates from a non-grouped relation that is + * not aware of eager aggregation, we must ensure that it provides + * the correct input for partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + input_path, + agg_info->agg_input); + + is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys, + path->pathkeys, + &presorted_keys); + if (!is_sorted) + { + /* + * Try at least sorting the cheapest path and also try + * incrementally sorting any path which is partially sorted + * already (no need to deal with paths which have presorted + * keys when incremental sort is disabled unless it's the + * cheapest input path). + */ + if (input_path != cheapest_total_path && + (presorted_keys == 0 || !enable_incremental_sort)) + continue; + + /* + * We've no need to consider both a sort and incremental sort. + * We'll just do a sort if there are no presorted keys and an + * incremental sort when there are presorted keys. + */ + if (presorted_keys == 0 || !enable_incremental_sort) + path = (Path *) create_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + -1.0); + else + path = (Path *) create_incremental_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + presorted_keys, + -1.0); + } + + /* + * qual is NIL because the HAVING clause cannot be evaluated until + * the final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_SORTED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumGroups); + + add_path(rel_grouped, path); + } + } + + if (can_sort && cheapest_partial_path != NULL) + { + ListCell *lc; + + /* Similar to above logic, but for partial paths. */ + foreach(lc, rel_plain->partial_pathlist) + { + Path *input_path = (Path *) lfirst(lc); + Path *path; + bool is_sorted; + int presorted_keys; + + /* + * Since the path originates from a non-grouped relation that is + * not aware of eager aggregation, we must ensure that it provides + * the correct input for partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + input_path, + agg_info->agg_input); + + is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys, + path->pathkeys, + &presorted_keys); + + if (!is_sorted) + { + /* + * Try at least sorting the cheapest path and also try + * incrementally sorting any path which is partially sorted + * already (no need to deal with paths which have presorted + * keys when incremental sort is disabled unless it's the + * cheapest input path). + */ + if (input_path != cheapest_partial_path && + (presorted_keys == 0 || !enable_incremental_sort)) + continue; + + /* + * We've no need to consider both a sort and incremental sort. + * We'll just do a sort if there are no presorted keys and an + * incremental sort when there are presorted keys. + */ + if (presorted_keys == 0 || !enable_incremental_sort) + path = (Path *) create_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + -1.0); + else + path = (Path *) create_incremental_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + presorted_keys, + -1.0); + } + + /* + * qual is NIL because the HAVING clause cannot be evaluated until + * the final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_SORTED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumPartialGroups); + + add_partial_path(rel_grouped, path); + } + } + + /* + * Add a partially-grouped HashAgg Path where possible + */ + if (can_hash && cheapest_total_path != NULL) + { + Path *path; + + /* + * Since the path originates from a non-grouped relation that is not + * aware of eager aggregation, we must ensure that it provides the + * correct input for partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + cheapest_total_path, + agg_info->agg_input); + + /* + * qual is NIL because the HAVING clause cannot be evaluated until the + * final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_HASHED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumGroups); + + add_path(rel_grouped, path); + } + + /* + * Now add a partially-grouped HashAgg partial Path where possible + */ + if (can_hash && cheapest_partial_path != NULL) + { + Path *path; + + /* + * Since the path originates from a non-grouped relation that is not + * aware of eager aggregation, we must ensure that it provides the + * correct input for partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + cheapest_partial_path, + agg_info->agg_input); + + /* + * qual is NIL because the HAVING clause cannot be evaluated until the + * final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_HASHED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumPartialGroups); + + add_partial_path(rel_grouped, path); + } +} + /* * make_rel_from_joinlist * Build access paths using a "joinlist" to guide the join path search. @@ -3462,6 +3855,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) * * After that, we're done creating paths for the joinrel, so run * set_cheapest(). + * + * In addition, we also run generate_grouped_paths() for the grouped + * relation of each just-processed joinrel, and run set_cheapest() for + * the grouped relation afterwards. */ foreach(lc, root->join_rel_level[lev]) { @@ -3482,6 +3879,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) /* Find and save the cheapest paths for this rel */ set_cheapest(rel); + /* + * Except for the topmost scan/join rel, consider generating + * partial aggregation paths for the grouped relation on top of + * the paths of this rel. After that, we're done creating paths + * for the grouped relation, so run set_cheapest(). + */ + if (!bms_equal(rel->relids, root->all_query_rels)) + { + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + rel_grouped = find_grouped_rel(root, rel->relids, + &agg_info); + if (rel_grouped) + { + generate_grouped_paths(root, rel_grouped, rel, + agg_info); + set_cheapest(rel_grouped); + } + } + #ifdef OPTIMIZER_DEBUG pprint(rel); #endif @@ -4350,6 +4768,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel) if (IS_DUMMY_REL(child_rel)) continue; + /* + * Except for the topmost scan/join rel, consider generating partial + * aggregation paths for the grouped relation on top of the paths of + * this partitioned child-join. After that, we're done creating paths + * for the grouped relation, so run set_cheapest(). + */ + if (!bms_equal(IS_OTHER_REL(rel) ? + rel->top_parent_relids : rel->relids, + root->all_query_rels)) + { + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + rel_grouped = find_grouped_rel(root, child_rel->relids, + &agg_info); + if (rel_grouped) + { + generate_grouped_paths(root, rel_grouped, child_rel, + agg_info); + set_cheapest(rel_grouped); + } + } + #ifdef OPTIMIZER_DEBUG pprint(child_rel); #endif diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index 7db5e30eef..e1a2d3b414 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -16,11 +16,13 @@ #include "miscadmin.h" #include "optimizer/appendinfo.h" +#include "optimizer/cost.h" #include "optimizer/joininfo.h" #include "optimizer/pathnode.h" #include "optimizer/paths.h" #include "partitioning/partbounds.h" #include "utils/memutils.h" +#include "utils/selfuncs.h" static void make_rels_by_clause_joins(PlannerInfo *root, @@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel); static bool restriction_is_constant_false(List *restrictlist, RelOptInfo *joinrel, bool only_pushed_down); +static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1, + RelOptInfo *rel2, RelOptInfo *joinrel, + SpecialJoinInfo *sjinfo, List *restrictlist); static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, RelOptInfo *joinrel, SpecialJoinInfo *sjinfo, List *restrictlist); @@ -771,6 +776,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2) return joinrel; } + /* Build a grouped join relation for 'joinrel' if possible. */ + make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo, + restrictlist); + /* Add paths to the join relation. */ populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo, restrictlist); @@ -882,6 +891,127 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids, return input_relids; } +/* + * make_grouped_join_rel + * Build a grouped join relation out of 'joinrel' if eager aggregation is + * possible and the 'joinrel' can produce grouped paths. + * + * We also generate partial aggregation paths for the grouped relation by + * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by + * joining the grouped paths of 'rel2' to the plain paths of 'rel1'. + */ +static void +make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1, + RelOptInfo *rel2, RelOptInfo *joinrel, + SpecialJoinInfo *sjinfo, List *restrictlist) +{ + RelOptInfo *rel_grouped; + RelAggInfo *agg_info = NULL; + RelOptInfo *rel1_grouped; + RelOptInfo *rel2_grouped; + bool rel1_empty; + bool rel2_empty; + + /* + * If there are no aggregate expressions or grouping expressions, eager + * aggregation is not possible. + */ + if (root->agg_clause_list == NIL || + root->group_expr_list == NIL) + return; + + /* + * See if we already have a grouped joinrel for this joinrel. + */ + rel_grouped = find_grouped_rel(root, joinrel->relids, &agg_info); + + /* + * Construct a new RelOptInfo for the grouped join relation if there is no + * existing one. + */ + if (rel_grouped == NULL) + { + /* + * Prepare the information needed to create grouped paths for this + * join relation. + */ + agg_info = create_rel_agg_info(root, joinrel); + if (agg_info == NULL) + return; + + /* build a grouped relation out of the plain relation */ + rel_grouped = build_grouped_rel(root, joinrel); + rel_grouped->reltarget = agg_info->target; + rel_grouped->rows = agg_info->grouped_rows; + + /* + * Make the grouped relation available for further joining or for + * acting as the upper rel representing the result of partial + * aggregation. + */ + add_grouped_rel(root, rel_grouped, agg_info); + } + + Assert(agg_info != NULL); + + /* We may have already proven this grouped join relation to be dummy. */ + if (IS_DUMMY_REL(rel_grouped)) + return; + + /* Retrieve the grouped relations for the two input rels */ + rel1_grouped = find_grouped_rel(root, rel1->relids, NULL); + rel2_grouped = find_grouped_rel(root, rel2->relids, NULL); + + rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped)); + rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped)); + + /* Nothing to do if there's no grouped relation. */ + if (rel1_empty && rel2_empty) + return; + + /* + * Joining two grouped relations is currently not supported. Grouping one + * side would alter the occurrence of the other side's aggregate transient + * states in the final aggregation input. While this issue could be + * addressed by adjusting the transient states, it is not deemed + * worthwhile for now. + */ + if (!rel1_empty && !rel2_empty) + return; + + /* Generate partial aggregation paths for the grouped relation */ + if (!rel1_empty) + { + set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2, + sjinfo, restrictlist); + populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped, + sjinfo, restrictlist); + + /* + * It shouldn't happen that we have marked rel1_grouped as dummy in + * populate_joinrel_with_paths due to provably constant-false join + * restrictions, hence we wouldn't end up with a plan that has Aggref + * in non-Agg plan node. + */ + Assert(!IS_DUMMY_REL(rel1_grouped)); + } + else if (!rel2_empty) + { + set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped, + sjinfo, restrictlist); + populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped, + sjinfo, restrictlist); + + /* + * It shouldn't happen that we have marked rel2_grouped as dummy in + * populate_joinrel_with_paths due to provably constant-false join + * restrictions, hence we wouldn't end up with a plan that has Aggref + * in non-Agg plan node. + */ + Assert(!IS_DUMMY_REL(rel2_grouped)); + } +} + /* * populate_joinrel_with_paths * Add paths to the given joinrel for given pair of joining relations. The @@ -1674,6 +1804,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, adjust_child_relids(joinrel->relids, nappinfos, appinfos))); + /* Build a grouped join relation for 'child_joinrel' if possible */ + make_grouped_join_rel(root, child_rel1, child_rel2, + child_joinrel, child_sjinfo, + child_restrictlist); + /* And make paths for the child join */ populate_joinrel_with_paths(root, child_rel1, child_rel2, child_joinrel, child_sjinfo, diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index e2c68fe6f9..2ca035dd80 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -14,6 +14,7 @@ */ #include "postgres.h" +#include "access/nbtree.h" #include "catalog/pg_type.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" @@ -80,6 +81,8 @@ typedef struct JoinTreeItem } JoinTreeItem; +static void create_agg_clause_infos(PlannerInfo *root); +static void create_grouping_expr_infos(PlannerInfo *root); static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel, Index rtindex); static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode, @@ -327,6 +330,255 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars, } } +/* + * setup_eager_aggregation + * Check if eager aggregation is applicable, and if so collect suitable + * aggregate expressions and grouping expressions in the query. + */ +void +setup_eager_aggregation(PlannerInfo *root) +{ + /* + * Don't apply eager aggregation if disabled by user. + */ + if (!enable_eager_aggregate) + return; + + /* + * Don't apply eager aggregation if there are no available GROUP BY + * clauses. + */ + if (!root->processed_groupClause) + return; + + /* + * For now we don't try to support grouping sets. + */ + if (root->parse->groupingSets) + return; + + /* + * For now we don't try to support DISTINCT or ORDER BY aggregates. + */ + if (root->numOrderedAggs > 0) + return; + + /* + * If there are any aggregates that do not support partial mode, or any + * partial aggregates that are non-serializable, do not apply eager + * aggregation. + */ + if (root->hasNonPartialAggs || root->hasNonSerialAggs) + return; + + /* + * We don't try to apply eager aggregation if there are set-returning + * functions in targetlist. + */ + if (root->parse->hasTargetSRFs) + return; + + /* + * Collect aggregate expressions and plain Vars that appear in targetlist + * and havingQual. + */ + create_agg_clause_infos(root); + + /* + * If there are no suitable aggregate expressions, we cannot apply eager + * aggregation. + */ + if (root->agg_clause_list == NIL) + return; + + /* + * Collect grouping expressions that appear in grouping clauses. + */ + create_grouping_expr_infos(root); +} + +/* + * create_agg_clause_infos + * Search the targetlist and havingQual for Aggrefs and plain Vars, and + * create an AggClauseInfo for each Aggref node. + */ +static void +create_agg_clause_infos(PlannerInfo *root) +{ + List *tlist_exprs; + ListCell *lc; + + Assert(root->agg_clause_list == NIL); + Assert(root->tlist_vars == NIL); + + tlist_exprs = pull_var_clause((Node *) root->processed_tlist, + PVC_INCLUDE_AGGREGATES | + PVC_RECURSE_WINDOWFUNCS | + PVC_RECURSE_PLACEHOLDERS); + + /* + * For now we don't try to support GROUPING() expressions. + */ + foreach(lc, tlist_exprs) + { + Expr *expr = (Expr *) lfirst(lc); + + if (IsA(expr, GroupingFunc)) + return; + } + + /* + * Aggregates within the HAVING clause need to be processed in the same + * way as those in the targetlist. Note that HAVING can contain Aggrefs + * but not WindowFuncs. + */ + if (root->parse->havingQual != NULL) + { + List *having_exprs; + + having_exprs = pull_var_clause((Node *) root->parse->havingQual, + PVC_INCLUDE_AGGREGATES | + PVC_RECURSE_PLACEHOLDERS); + if (having_exprs != NIL) + { + tlist_exprs = list_concat(tlist_exprs, having_exprs); + list_free(having_exprs); + } + } + + foreach(lc, tlist_exprs) + { + Expr *expr = (Expr *) lfirst(lc); + Aggref *aggref; + AggClauseInfo *ac_info; + + /* + * collect plain Vars for future reference + */ + if (IsA(expr, Var)) + { + root->tlist_vars = list_append_unique(root->tlist_vars, expr); + continue; + } + + aggref = castNode(Aggref, expr); + + Assert(aggref->aggorder == NIL); + Assert(aggref->aggdistinct == NIL); + + ac_info = makeNode(AggClauseInfo); + ac_info->aggref = aggref; + ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref); + + root->agg_clause_list = + list_append_unique(root->agg_clause_list, ac_info); + } + + list_free(tlist_exprs); +} + +/* + * create_grouping_expr_infos + * Create GroupExprInfo for each expression usable as grouping key. + * + * If any grouping expression is not suitable, we will just return with + * root->group_expr_list being NIL. + */ +static void +create_grouping_expr_infos(PlannerInfo *root) +{ + List *exprs = NIL; + List *sortgrouprefs = NIL; + List *btree_opfamilies = NIL; + ListCell *lc, + *lc1, + *lc2, + *lc3; + + Assert(root->group_expr_list == NIL); + + foreach(lc, root->processed_groupClause) + { + SortGroupClause *sgc = lfirst_node(SortGroupClause, lc); + TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist); + TypeCacheEntry *tce; + Oid equalimageproc; + Oid eq_op; + List *eq_opfamilies; + Oid btree_opfamily; + + Assert(tle->ressortgroupref > 0); + + /* + * For now we only support plain Vars as grouping expressions. + */ + if (!IsA(tle->expr, Var)) + return; + + /* + * Eager aggregation is only possible if equality of grouping keys, as + * defined by the equality operator, implies bitwise equality. + * Otherwise, if we put keys with different byte images into the same + * group, we may lose some information that could be needed to + * evaluate upper qual clauses. + * + * For example, the NUMERIC data type is not supported because values + * that fall into the same group according to the equality operator + * (e.g. 0 and 0.0) can have different scale. + */ + tce = lookup_type_cache(exprType((Node *) tle->expr), + TYPECACHE_BTREE_OPFAMILY); + if (!OidIsValid(tce->btree_opf) || + !OidIsValid(tce->btree_opintype)) + return; + + equalimageproc = get_opfamily_proc(tce->btree_opf, + tce->btree_opintype, + tce->btree_opintype, + BTEQUALIMAGE_PROC); + if (!OidIsValid(equalimageproc) || + !DatumGetBool(OidFunctionCall1Coll(equalimageproc, + tce->typcollation, + ObjectIdGetDatum(tce->btree_opintype)))) + return; + + /* + * Get the operator in the btree's opfamily. + */ + eq_op = get_opfamily_member(tce->btree_opf, + tce->btree_opintype, + tce->btree_opintype, + BTEqualStrategyNumber); + if (!OidIsValid(eq_op)) + return; + eq_opfamilies = get_mergejoin_opfamilies(eq_op); + if (!eq_opfamilies) + return; + btree_opfamily = linitial_oid(eq_opfamilies); + + exprs = lappend(exprs, tle->expr); + sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref); + btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily); + } + + /* + * Construct GroupExprInfo for each expression. + */ + forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies) + { + Expr *expr = (Expr *) lfirst(lc1); + int sortgroupref = lfirst_int(lc2); + Oid btree_opfamily = lfirst_oid(lc3); + GroupExprInfo *ge_info; + + ge_info = makeNode(GroupExprInfo); + ge_info->expr = (Expr *) copyObject(expr); + ge_info->sortgroupref = sortgroupref; + ge_info->btree_opfamily = btree_opfamily; + + root->group_expr_list = lappend(root->group_expr_list, ge_info); + } +} /***************************************************************************** * diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index fd8b2b0ca3..ece6936e23 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -67,6 +67,9 @@ query_planner(PlannerInfo *root, root->join_rel_list = makeNode(RelInfoList); root->join_rel_list->items = NIL; root->join_rel_list->hash = NULL; + root->agg_info_list = makeNode(RelInfoList); + root->agg_info_list->items = NIL; + root->agg_info_list->hash = NULL; root->join_rel_level = NULL; root->join_cur_level = 0; root->canon_pathkeys = NIL; @@ -77,6 +80,9 @@ query_planner(PlannerInfo *root, root->placeholder_list = NIL; root->placeholder_array = NULL; root->placeholder_array_size = 0; + root->agg_clause_list = NIL; + root->group_expr_list = NIL; + root->tlist_vars = NIL; root->fkey_list = NIL; root->initial_rels = NIL; @@ -258,6 +264,12 @@ query_planner(PlannerInfo *root, */ extract_restriction_or_clauses(root); + /* + * Check if eager aggregation is applicable, and if so, set up + * root->agg_clause_list and root->group_expr_list. + */ + setup_eager_aggregation(root); + /* * Now expand appendrels by adding "otherrels" for their children. We * delay this to the end so that we have as much information as possible diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 948afd9094..b403a46d53 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -225,7 +225,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, RelOptInfo *partially_grouped_rel, const AggClauseCosts *agg_costs, grouping_sets_data *gd, - double dNumGroups, GroupPathExtraData *extra); static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root, RelOptInfo *grouped_rel, @@ -3999,9 +3998,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, GroupPathExtraData *extra, RelOptInfo **partially_grouped_rel_p) { - Path *cheapest_path = input_rel->cheapest_total_path; RelOptInfo *partially_grouped_rel = NULL; - double dNumGroups; PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE; /* @@ -4082,23 +4079,21 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, /* Gather any partially grouped partial paths. */ if (partially_grouped_rel && partially_grouped_rel->partial_pathlist) - { gather_grouping_paths(root, partially_grouped_rel); - set_cheapest(partially_grouped_rel); - } /* - * Estimate number of groups. + * Now choose the best path(s) for partially_grouped_rel. + * + * Note that the non-partial paths can come either from the Gather above + * or from eager aggregation. */ - dNumGroups = get_number_of_groups(root, - cheapest_path->rows, - gd, - extra->targetList); + if (partially_grouped_rel && partially_grouped_rel->pathlist) + set_cheapest(partially_grouped_rel); /* Build final grouping paths */ add_paths_to_grouping_rel(root, input_rel, grouped_rel, partially_grouped_rel, agg_costs, gd, - dNumGroups, extra); + extra); /* Give a helpful error if we failed to find any implementation */ if (grouped_rel->pathlist == NIL) @@ -6966,16 +6961,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, RelOptInfo *grouped_rel, RelOptInfo *partially_grouped_rel, const AggClauseCosts *agg_costs, - grouping_sets_data *gd, double dNumGroups, + grouping_sets_data *gd, GroupPathExtraData *extra) { Query *parse = root->parse; Path *cheapest_path = input_rel->cheapest_total_path; + Path *cheapest_partially_grouped_path = NULL; ListCell *lc; bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0; bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0; List *havingQual = (List *) extra->havingQual; AggClauseCosts *agg_final_costs = &extra->agg_final_costs; + double dNumGroups = 0; + double dNumFinalGroups = 0; + + /* + * Estimate number of groups for non-split aggregation. + */ + dNumGroups = get_number_of_groups(root, + cheapest_path->rows, + gd, + extra->targetList); + + if (partially_grouped_rel && partially_grouped_rel->pathlist) + { + cheapest_partially_grouped_path = + partially_grouped_rel->cheapest_total_path; + + /* + * Estimate number of groups for final phase of partial aggregation. + */ + dNumFinalGroups = + get_number_of_groups(root, + cheapest_partially_grouped_path->rows, + gd, + extra->targetList); + } if (can_sort) { @@ -7087,7 +7108,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, path = make_ordered_path(root, grouped_rel, path, - partially_grouped_rel->cheapest_total_path, + cheapest_partially_grouped_path, info->pathkeys); if (path == NULL) @@ -7104,7 +7125,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, info->clauses, havingQual, agg_final_costs, - dNumGroups)); + dNumFinalGroups)); else add_path(grouped_rel, (Path *) create_group_path(root, @@ -7112,7 +7133,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, path, info->clauses, havingQual, - dNumGroups)); + dNumFinalGroups)); } } @@ -7154,19 +7175,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, */ if (partially_grouped_rel && partially_grouped_rel->pathlist) { - Path *path = partially_grouped_rel->cheapest_total_path; - add_path(grouped_rel, (Path *) create_agg_path(root, grouped_rel, - path, + cheapest_partially_grouped_path, grouped_rel->reltarget, AGG_HASHED, AGGSPLIT_FINAL_DESERIAL, root->processed_groupClause, havingQual, agg_final_costs, - dNumGroups)); + dNumFinalGroups)); } } @@ -7216,6 +7235,21 @@ create_partial_grouping_paths(PlannerInfo *root, bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0; bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0; + /* + * The partially_grouped_rel could have been already created due to eager + * aggregation. + */ + partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL); + Assert(enable_eager_aggregate || partially_grouped_rel == NULL); + + /* + * It is possible that the partially_grouped_rel created by eager + * aggregation is dummy. In this case we just set it to NULL. It might + * be created again by the following logic if possible. + */ + if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel)) + partially_grouped_rel = NULL; + /* * Consider whether we should generate partially aggregated non-partial * paths. We can only do this if we have a non-partial path, and only if @@ -7239,19 +7273,27 @@ create_partial_grouping_paths(PlannerInfo *root, * If we can't partially aggregate partial paths, and we can't partially * aggregate non-partial paths, then don't bother creating the new * RelOptInfo at all, unless the caller specified force_rel_creation. + * + * Note that the partially_grouped_rel could have been already created and + * populated with appropriate paths by eager aggregation. */ if (cheapest_total_path == NULL && cheapest_partial_path == NULL && + (partially_grouped_rel == NULL || + partially_grouped_rel->pathlist == NIL) && !force_rel_creation) return NULL; /* * Build a new upper relation to represent the result of partially - * aggregating the rows from the input relation. - */ - partially_grouped_rel = fetch_upper_rel(root, - UPPERREL_PARTIAL_GROUP_AGG, - grouped_rel->relids); + * aggregating the rows from the input relation. The relation may already + * exist due to eager aggregation, in which case we don't need to create + * it. + */ + if (partially_grouped_rel == NULL) + partially_grouped_rel = fetch_upper_rel(root, + UPPERREL_PARTIAL_GROUP_AGG, + grouped_rel->relids); partially_grouped_rel->consider_parallel = grouped_rel->consider_parallel; partially_grouped_rel->reloptkind = grouped_rel->reloptkind; @@ -7260,6 +7302,14 @@ create_partial_grouping_paths(PlannerInfo *root, partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent; partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine; + /* + * Partially-grouped partial paths may have been generated by eager + * aggregation. If we find that parallelism is not possible for + * partially_grouped_rel, we need to drop these partial paths. + */ + if (!partially_grouped_rel->consider_parallel) + partially_grouped_rel->partial_pathlist = NIL; + /* * Build target list for partial aggregate paths. These paths cannot just * emit the same tlist as regular aggregate paths, because (1) we must diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index 4989722637..4884d9ddea 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -499,6 +499,66 @@ adjust_appendrel_attrs_mutator(Node *node, return (Node *) newinfo; } + /* + * We have to process RelAggInfo nodes specially. + */ + if (IsA(node, RelAggInfo)) + { + RelAggInfo *oldinfo = (RelAggInfo *) node; + RelAggInfo *newinfo = makeNode(RelAggInfo); + + /* Copy all flat-copiable fields */ + memcpy(newinfo, oldinfo, sizeof(RelAggInfo)); + + newinfo->relids = adjust_child_relids(oldinfo->relids, + context->nappinfos, + context->appinfos); + + newinfo->target = (PathTarget *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->target, + context); + + newinfo->agg_input = (PathTarget *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input, + context); + + newinfo->group_clauses = (List *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses, + context); + + newinfo->group_exprs = (List *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs, + context); + + return (Node *) newinfo; + } + + /* + * We have to process PathTarget nodes specially. + */ + if (IsA(node, PathTarget)) + { + PathTarget *oldtarget = (PathTarget *) node; + PathTarget *newtarget = makeNode(PathTarget); + + /* Copy all flat-copiable fields */ + memcpy(newtarget, oldtarget, sizeof(PathTarget)); + + if (oldtarget->sortgrouprefs) + { + Size nbytes = list_length(oldtarget->exprs) * sizeof(Index); + + newtarget->exprs = (List *) + adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs, + context); + + newtarget->sortgrouprefs = (Index *) palloc(nbytes); + memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes); + } + + return (Node *) newtarget; + } + /* * NOTE: we do not need to recurse into sublinks, because they should * already have been converted to subplans before we see them. diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 54e042a8a5..3cb450b376 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -2702,8 +2702,7 @@ create_projection_path(PlannerInfo *root, pathnode->path.pathtype = T_Result; pathnode->path.parent = rel; pathnode->path.pathtarget = target; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe && @@ -2955,8 +2954,7 @@ create_incremental_sort_path(PlannerInfo *root, pathnode->path.parent = rel; /* Sort doesn't project, so use source path's pathtarget */ pathnode->path.pathtarget = subpath->pathtarget; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe; @@ -3002,8 +3000,7 @@ create_sort_path(PlannerInfo *root, pathnode->path.parent = rel; /* Sort doesn't project, so use source path's pathtarget */ pathnode->path.pathtarget = subpath->pathtarget; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe; @@ -3161,8 +3158,7 @@ create_agg_path(PlannerInfo *root, pathnode->path.pathtype = T_Agg; pathnode->path.parent = rel; pathnode->path.pathtarget = target; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe; diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 76e13971f7..eec678b93c 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -16,6 +16,7 @@ #include <limits.h> +#include "catalog/pg_constraint.h" #include "miscadmin.h" #include "nodes/nodeFuncs.h" #include "optimizer/appendinfo.h" @@ -27,22 +28,25 @@ #include "optimizer/paths.h" #include "optimizer/placeholder.h" #include "optimizer/plancat.h" +#include "optimizer/planner.h" #include "optimizer/restrictinfo.h" #include "optimizer/tlist.h" +#include "parser/parse_oper.h" #include "parser/parse_relation.h" #include "rewrite/rewriteManip.h" #include "utils/hsearch.h" #include "utils/lsyscache.h" +#include "utils/selfuncs.h" /* - * An entry of a hash table that we use to make lookup for RelOptInfo - * structures more efficient. + * An entry of a hash table that we use to make lookup for RelOptInfo or + * RelAggInfo structures more efficient. */ typedef struct RelInfoEntry { Relids relids; /* hash key --- MUST BE FIRST */ - RelOptInfo *rel; + void *data; } RelInfoEntry; static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, @@ -87,6 +91,15 @@ static void build_child_join_reltarget(PlannerInfo *root, RelOptInfo *childrel, int nappinfos, AppendRelInfo **appinfos); +static bool eager_aggregation_possible_for_relation(PlannerInfo *root, + RelOptInfo *rel); +static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel, + PathTarget *target, PathTarget *agg_input, + List **group_exprs_extra_p, + Index *maxSortGroupRef); +static bool is_var_in_aggref_only(PlannerInfo *root, Var *var); +static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel); +static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr); /* @@ -410,6 +423,101 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) return rel; } +/* + * build_simple_grouped_rel + * Construct a new RelOptInfo for a grouped base relation out of an existing + * non-grouped base relation. + * + * On success, the new RelOptInfo is returned and the corresponding RelAggInfo + * is stored in *agg_info_p. + */ +RelOptInfo * +build_simple_grouped_rel(PlannerInfo *root, int relid, + RelAggInfo **agg_info_p) +{ + RelOptInfo *rel_plain; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + /* + * We should have available aggregate expressions and grouping + * expressions, otherwise we cannot reach here. + */ + Assert(root->agg_clause_list != NIL); + Assert(root->group_expr_list != NIL); + + rel_plain = find_base_rel(root, relid); + + /* nothing to do for dummy rel */ + if (IS_DUMMY_REL(rel_plain)) + return NULL; + + /* + * Prepare the information needed to create grouped paths for this base + * relation. + */ + agg_info = create_rel_agg_info(root, rel_plain); + if (agg_info == NULL) + return NULL; + + /* build a grouped relation out of the plain relation */ + rel_grouped = build_grouped_rel(root, rel_plain); + rel_grouped->reltarget = agg_info->target; + rel_grouped->rows = agg_info->grouped_rows; + + /* return the RelAggInfo structure */ + *agg_info_p = agg_info; + + return rel_grouped; +} + +/* + * build_grouped_rel + * Build a grouped relation by flat copying a plain relation and resetting + * the necessary fields. + */ +RelOptInfo * +build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain) +{ + RelOptInfo *rel_grouped; + + rel_grouped = makeNode(RelOptInfo); + memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo)); + + /* + * clear path info + */ + rel_grouped->pathlist = NIL; + rel_grouped->ppilist = NIL; + rel_grouped->partial_pathlist = NIL; + rel_grouped->cheapest_startup_path = NULL; + rel_grouped->cheapest_total_path = NULL; + rel_grouped->cheapest_unique_path = NULL; + rel_grouped->cheapest_parameterized_paths = NIL; + + /* + * clear partition info + */ + rel_grouped->part_scheme = NULL; + rel_grouped->nparts = -1; + rel_grouped->boundinfo = NULL; + rel_grouped->partbounds_merged = false; + rel_grouped->partition_qual = NIL; + rel_grouped->part_rels = NULL; + rel_grouped->live_parts = NULL; + rel_grouped->all_partrels = NULL; + rel_grouped->partexprs = NULL; + rel_grouped->nullable_partexprs = NULL; + rel_grouped->consider_partitionwise_join = false; + + /* + * clear size estimates + */ + rel_grouped->rows = 0; + + return rel_grouped; +} + /* * find_base_rel * Find a base or otherrel relation entry, which must already exist. @@ -484,7 +592,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid) /* * build_rel_hash - * Construct the auxiliary hash table for relations. + * Construct the auxiliary hash table for relation-specific entries. */ static void build_rel_hash(RelInfoList *list) @@ -504,19 +612,27 @@ build_rel_hash(RelInfoList *list) &hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); - /* Insert all the already-existing relations */ + /* Insert all the already-existing relation-specific entries */ foreach(l, list->items) { - RelOptInfo *rel = (RelOptInfo *) lfirst(l); + void *item = lfirst(l); RelInfoEntry *hentry; bool found; + Relids relids; + + Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo)); + + if (IsA(item, RelOptInfo)) + relids = ((RelOptInfo *) item)->relids; + else + relids = ((RelAggInfo *) item)->relids; hentry = (RelInfoEntry *) hash_search(hashtab, - &(rel->relids), + &relids, HASH_ENTER, &found); Assert(!found); - hentry->rel = rel; + hentry->data = item; } list->hash = hashtab; @@ -524,9 +640,9 @@ build_rel_hash(RelInfoList *list) /* * find_rel_info - * Find an RelOptInfo entry. + * Find a RelOptInfo or a RelAggInfo entry. */ -static RelOptInfo * +static void * find_rel_info(RelInfoList *list, Relids relids) { if (list == NULL) @@ -557,7 +673,7 @@ find_rel_info(RelInfoList *list, Relids relids) HASH_FIND, NULL); if (hentry) - return hentry->rel; + return hentry->data; } else { @@ -565,10 +681,18 @@ find_rel_info(RelInfoList *list, Relids relids) foreach(l, list->items) { - RelOptInfo *rel = (RelOptInfo *) lfirst(l); + void *item = lfirst(l); + Relids item_relids; + + Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo)); - if (bms_equal(rel->relids, relids)) - return rel; + if (IsA(item, RelOptInfo)) + item_relids = ((RelOptInfo *) item)->relids; + else + item_relids = ((RelAggInfo *) item)->relids; + + if (bms_equal(item_relids, relids)) + return item; } } @@ -583,44 +707,46 @@ find_rel_info(RelInfoList *list, Relids relids) RelOptInfo * find_join_rel(PlannerInfo *root, Relids relids) { - return find_rel_info(root->join_rel_list, relids); + return (RelOptInfo *) find_rel_info(root->join_rel_list, relids); } /* - * add_rel_info - * Add given relation to the given list. Also add it to the auxiliary - * hashtable if there is one. + * find_grouped_rel + * Returns relation entry corresponding to 'relids' (a set of RT indexes), + * or NULL if none exists. This is for grouped relations. + * + * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one + * exists) will be returned in *agg_info_p. */ -static void -add_rel_info(RelInfoList *list, RelOptInfo *rel) +RelOptInfo * +find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p) { - /* GEQO requires us to append the new relation to the end of the list! */ - list->items = lappend(list->items, rel); + RelOptInfo *rel; - /* store it into the auxiliary hashtable if there is one. */ - if (list->hash) + rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], + relids); + if (rel == NULL) { - RelInfoEntry *hentry; - bool found; + if (agg_info_p) + *agg_info_p = NULL; - hentry = (RelInfoEntry *) hash_search(list->hash, - &(rel->relids), - HASH_ENTER, - &found); - Assert(!found); - hentry->rel = rel; + return NULL; } -} -/* - * add_join_rel - * Add given join relation to the list of join relations in the given - * PlannerInfo. - */ -static void -add_join_rel(PlannerInfo *root, RelOptInfo *joinrel) -{ - add_rel_info(root->join_rel_list, joinrel); + /* also return the corresponding RelAggInfo, if asked */ + if (agg_info_p) + { + RelAggInfo *agg_info; + + agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids); + + /* The relation exists, so the agg_info should be there too. */ + Assert(agg_info != NULL); + + *agg_info_p = agg_info; + } + + return rel; } /* @@ -672,6 +798,64 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel, } } +/* + * add_rel_info + * Add relation-specific entry to a list, and also add it to the auxiliary + * hashtable if there is one. + */ +static void +add_rel_info(RelInfoList *list, void *data) +{ + Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo)); + + /* GEQO requires us to append the new relation to the end of the list! */ + list->items = lappend(list->items, data); + + /* store it into the auxiliary hashtable if there is one. */ + if (list->hash) + { + RelInfoEntry *hentry; + bool found; + Relids relids; + + if (IsA(data, RelOptInfo)) + relids = ((RelOptInfo *) data)->relids; + else + relids = ((RelAggInfo *) data)->relids; + + hentry = (RelInfoEntry *) hash_search(list->hash, + &relids, + HASH_ENTER, + &found); + Assert(!found); + hentry->data = data; + } +} + +/* + * add_join_rel + * Add given join relation to the list of join relations in the given + * PlannerInfo. + */ +static void +add_join_rel(PlannerInfo *root, RelOptInfo *joinrel) +{ + add_rel_info(root->join_rel_list, joinrel); +} + +/* + * add_grouped_rel + * Add given grouped relation to the list of grouped relations in the + * given PlannerInfo. Also add the corresponding RelAggInfo to + * root->agg_info_list. + */ +void +add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info) +{ + add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel); + add_rel_info(root->agg_info_list, agg_info); +} + /* * build_join_rel * Returns relation entry corresponding to the union of two given rels, @@ -1491,7 +1675,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids) /* If we already made this upperrel for the query, return it */ if (list) { - upperrel = find_rel_info(list, relids); + upperrel = (RelOptInfo *) find_rel_info(list, relids); if (upperrel) return upperrel; } @@ -2528,3 +2712,503 @@ build_child_join_reltarget(PlannerInfo *root, childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple; childrel->reltarget->width = parentrel->reltarget->width; } + +/* + * create_rel_agg_info + * Create the RelAggInfo structure for the given relation if it can produce + * grouped paths. The given relation is the non-grouped one which has the + * reltarget already constructed. + */ +RelAggInfo * +create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel) +{ + ListCell *lc; + RelAggInfo *result; + PathTarget *agg_input; + PathTarget *target; + Index maxSortGroupRef; + List *grp_exprs_extra = NIL; + List *eager_group_clauses; + int i; + + /* + * The lists of aggregate expressions and grouping expressions should have + * been constructed. + */ + Assert(root->agg_clause_list != NIL); + Assert(root->group_expr_list != NIL); + + /* + * If this is a child rel, the grouped rel for its parent rel must have + * been created if it can. So we can just use parent's RelAggInfo if + * there is one, with appropriate variable substitutions. + */ + if (IS_OTHER_REL(rel)) + { + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + Assert(!bms_is_empty(rel->top_parent_relids)); + rel_grouped = find_grouped_rel(root, rel->top_parent_relids, &agg_info); + + if (rel_grouped == NULL) + return NULL; + + Assert(agg_info != NULL); + /* Must do multi-level transformation */ + agg_info = (RelAggInfo *) + adjust_appendrel_attrs_multilevel(root, + (Node *) agg_info, + rel, + rel->top_parent); + + agg_info->grouped_rows = + estimate_num_groups(root, agg_info->group_exprs, + rel->rows, NULL, NULL); + + return agg_info; + } + + /* Check if it's possible to produce grouped paths for this relation. */ + if (!eager_aggregation_possible_for_relation(root, rel)) + return NULL; + + /* + * Create targets for the grouped paths and for the input paths of the + * grouped paths. + */ + target = create_empty_pathtarget(); + agg_input = create_empty_pathtarget(); + + /* ... and initialize these targets */ + if (!init_grouping_targets(root, rel, target, agg_input, + &grp_exprs_extra, &maxSortGroupRef)) + return NULL; + + /* + * Eager aggregation is not applicable if there are no available grouping + * expressions. + */ + if (maxSortGroupRef == 0 && + list_length(grp_exprs_extra) == 0) + return NULL; + + /* + * With the current max SortGroupRef within agg_input determined, we can + * now add the expressions that are needed by upper joins to the grouping + * clauses and the targets. + */ + eager_group_clauses = list_copy(root->processed_groupClause); + foreach(lc, grp_exprs_extra) + { + Var *var = lfirst_node(Var, lc); + SortGroupClause *cl = makeNode(SortGroupClause); + + /* Initialize the SortGroupClause. */ + cl->tleSortGroupRef = ++maxSortGroupRef; + get_sort_group_operators(var->vartype, + false, true, false, + &cl->sortop, &cl->eqop, NULL, + &cl->hashable); + + eager_group_clauses = lappend(eager_group_clauses, cl); + + /* This Var should be emitted by the grouped paths */ + add_column_to_pathtarget(target, (Expr *) var, + cl->tleSortGroupRef); + + /* ... and it also should be emitted by the input paths. */ + add_column_to_pathtarget(agg_input, (Expr *) var, + cl->tleSortGroupRef); + } + + /* + * Build a list of grouping expressions and a list of the corresponding + * SortGroupClauses. + */ + i = 0; + result = makeNode(RelAggInfo); + foreach(lc, target->exprs) + { + Index sortgroupref = 0; + SortGroupClause *cl; + Expr *texpr; + + texpr = (Expr *) lfirst(lc); + + Assert(IsA(texpr, Var)); + + sortgroupref = target->sortgrouprefs[i++]; + if (sortgroupref == 0) + continue; + + /* find the SortGroupClause in eager_group_clauses */ + cl = get_sortgroupref_clause(sortgroupref, eager_group_clauses); + + /* do not add this SortGroupClause if it has already been added */ + if (list_member(result->group_clauses, cl)) + continue; + + result->group_clauses = lappend(result->group_clauses, cl); + result->group_exprs = list_append_unique(result->group_exprs, + texpr); + } + + /* + * Calculate pathkeys that represent this grouping requirements. + */ + result->group_pathkeys = + make_pathkeys_for_sortclauses(root, result->group_clauses, + make_tlist_from_pathtarget(target)); + + /* + * Add aggregates to the grouping target. + */ + foreach(lc, root->agg_clause_list) + { + AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); + Aggref *aggref; + + Assert(IsA(ac_info->aggref, Aggref)); + + aggref = (Aggref *) copyObject(ac_info->aggref); + mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL); + + add_column_to_pathtarget(target, (Expr *) aggref, 0); + } + + /* Set the estimated eval cost and output width for both targets */ + set_pathtarget_cost_width(root, target); + set_pathtarget_cost_width(root, agg_input); + + result->relids = bms_copy(rel->relids); + result->target = target; + result->agg_input = agg_input; + result->grouped_rows = estimate_num_groups(root, result->group_exprs, + rel->rows, NULL, NULL); + + return result; +} + +/* + * eager_aggregation_possible_for_relation + * Check if it's possible to produce grouped paths for the given relation. + */ +static bool +eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel) +{ + ListCell *lc; + int cur_relid; + + /* + * Check to see if the given relation is in the nullable side of an outer + * join. In this case, we cannot push a partial aggregation down to the + * relation, because the NULL-extended rows produced by the outer join + * would not be available when we perform the partial aggregation, while + * with a non-eager-aggregation plan these rows are available for the + * top-level aggregation. Doing so may result in the rows being grouped + * differently than expected, or produce incorrect values from the + * aggregate functions. + */ + cur_relid = -1; + while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0) + { + RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid); + + if (baserel == NULL) + continue; /* ignore outer joins in rel->relids */ + + if (!bms_is_subset(baserel->nulling_relids, rel->relids)) + return false; + } + + /* + * For now we don't try to support PlaceHolderVars. + */ + foreach(lc, rel->reltarget->exprs) + { + Expr *expr = lfirst(lc); + + if (IsA(expr, PlaceHolderVar)) + return false; + } + + /* Caller should only pass base relations or joins. */ + Assert(rel->reloptkind == RELOPT_BASEREL || + rel->reloptkind == RELOPT_JOINREL); + + /* + * Check if all aggregate expressions can be evaluated on this relation + * level. + */ + foreach(lc, root->agg_clause_list) + { + AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); + + Assert(IsA(ac_info->aggref, Aggref)); + + /* + * Give up if any aggregate needs relations other than the current + * one. + * + * If the aggregate needs the current rel plus anything else, grouping + * the current rel could make some input variables unavailable for the + * higher aggregate and also reduce the number of input rows it + * receives. + * + * If the aggregate does not need the current rel at all, then the + * current rel should not be grouped, as we do not support joining two + * grouped relations. + */ + if (!bms_is_subset(ac_info->agg_eval_at, rel->relids)) + return false; + } + + return true; +} + +/* + * init_grouping_targets + * Initialize the target for grouped paths (target) as well as the target + * for paths that generate input for the grouped paths (agg_input). + * + * *group_exprs_extra_p receives a list of Var nodes for which we need to + * construct SortGroupClauses. Those Vars will then be used as additional + * grouping expressions, for the sake of join clauses. + * + * *maxSortGroupRef receives the max SortGroupRef within agg_input. + * + * Return true if the targets could be initialized, false otherwise. + */ +static bool +init_grouping_targets(PlannerInfo *root, RelOptInfo *rel, + PathTarget *target, PathTarget *agg_input, + List **group_exprs_extra_p, + Index *maxSortGroupRef) +{ + ListCell *lc; + List *possibly_dependent = NIL; + + *maxSortGroupRef = 0; + + foreach(lc, rel->reltarget->exprs) + { + Expr *expr = (Expr *) lfirst(lc); + Index sortgroupref; + + /* + * Given that PlaceHolderVar currently prevents us from doing eager + * aggregation, the source target cannot contain anything more complex + * than a Var. + */ + Assert(IsA(expr, Var)); + + /* Get the sortgroupref if the expr can act as grouping expression. */ + sortgroupref = get_expression_sortgroupref(root, expr); + if (sortgroupref > 0) + { + /* + * If the target expression can be used as a grouping key, it + * should be emitted by the grouped paths that have been pushed + * down to this relation level. + */ + add_column_to_pathtarget(target, expr, sortgroupref); + + /* + * ... and it also should be emitted by the input paths. + */ + add_column_to_pathtarget(agg_input, expr, sortgroupref); + + /* Update the max SortGroupRef */ + if (sortgroupref > *maxSortGroupRef) + *maxSortGroupRef = sortgroupref; + } + else if (is_var_needed_by_join(root, (Var *) expr, rel)) + { + /* + * The expression is needed for an upper join but is neither in + * the GROUP BY clause nor derivable from it using EC (otherwise, + * it would have already been included in the targets above). We + * need to create a special SortGroupClause for this expression. + * + * Note that its tleSortGroupRef needs to be unique within + * agg_input, so we need to postpone creation of this + * SortGroupClause until we're done with the iteration of + * rel->reltarget->exprs. + */ + *group_exprs_extra_p = lappend(*group_exprs_extra_p, expr); + } + else if (is_var_in_aggref_only(root, (Var *) expr)) + { + /* + * The expression is referenced by an aggregate function pushed + * down to this relation and does not appear elsewhere in the + * targetlist or havingQual. Add it to 'agg_input' but not to + * 'target'. + */ + add_new_column_to_pathtarget(agg_input, expr); + } + else + { + /* + * The expression may be functionally dependent on other + * expressions in the target, but we cannot verify this until all + * target expressions have been constructed. + */ + possibly_dependent = lappend(possibly_dependent, expr); + } + } + + /* + * Now we can verify whether an expression is functionally dependent on + * others. + */ + foreach(lc, possibly_dependent) + { + Var *tvar; + List *deps = NIL; + RangeTblEntry *rte; + + tvar = lfirst_node(Var, lc); + rte = root->simple_rte_array[tvar->varno]; + + if (check_functional_grouping(rte->relid, tvar->varno, + tvar->varlevelsup, + target->exprs, &deps)) + { + /* + * The expression is functionally dependent on other target + * expressions, so it can be included in the targets. Since it + * will not be used as a grouping key, a sortgroupref is not + * needed for it. + */ + add_new_column_to_pathtarget(target, (Expr *) tvar); + add_new_column_to_pathtarget(agg_input, (Expr *) tvar); + } + else + { + /* + * We may arrive here with a grouping expression that is proven + * redundant by EquivalenceClass processing, such as 't1.a' in the + * query below. + * + * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a, + * t1.b; + * + * For now we just give up in this case. + */ + return false; + } + } + + return true; +} + +/* + * is_var_in_aggref_only + * Check whether the given Var appears in aggregate expressions and not + * elsewhere in the targetlist or havingQual. + */ +static bool +is_var_in_aggref_only(PlannerInfo *root, Var *var) +{ + ListCell *lc; + + /* + * Search the list of aggregate expressions for the Var. + */ + foreach(lc, root->agg_clause_list) + { + AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); + List *vars; + + Assert(IsA(ac_info->aggref, Aggref)); + + if (!bms_is_member(var->varno, ac_info->agg_eval_at)) + continue; + + vars = pull_var_clause((Node *) ac_info->aggref, + PVC_RECURSE_AGGREGATES | + PVC_RECURSE_WINDOWFUNCS | + PVC_RECURSE_PLACEHOLDERS); + + if (list_member(vars, var)) + { + list_free(vars); + break; + } + + list_free(vars); + } + + return (lc != NULL && !list_member(root->tlist_vars, var)); +} + +/* + * is_var_needed_by_join + * Check if the given Var is needed by joins above the current rel. + * + * Consider pushing the aggregate avg(b.y) down to relation b for the following + * query: + * + * SELECT a.i, avg(b.y) + * FROM a JOIN b ON a.j = b.j + * GROUP BY a.i; + * + * Column b.j needs to be used as the grouping key because otherwise it cannot + * find its way to the input of the join expression. + */ +static bool +is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel) +{ + Relids relids; + int attno; + RelOptInfo *baserel; + + /* + * Note that when checking if the Var is needed by joins above, we want to + * exclude cases where the Var is only needed in the final output. So + * include "relation 0" in the check. + */ + relids = bms_copy(rel->relids); + relids = bms_add_member(relids, 0); + + baserel = find_base_rel(root, var->varno); + attno = var->varattno - baserel->min_attr; + + return bms_nonempty_difference(baserel->attr_needed[attno], relids); +} + +/* + * get_expression_sortgroupref + * Return sortgroupref if the given 'expr' can be used as a grouping key in + * grouped paths for base or join relations, or 0 otherwise. + * + * We first check if 'expr' is among the grouping expressions. If it is not, + * we then check if 'expr' is known equal to any of the grouping expressions + * due to equivalence relationships. + */ +static Index +get_expression_sortgroupref(PlannerInfo *root, Expr *expr) +{ + ListCell *lc; + + foreach(lc, root->group_expr_list) + { + GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc); + + Assert(IsA(ge_info->expr, Var)); + + if (equal(ge_info->expr, expr) || + exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr, + ge_info->btree_opfamily)) + { + Assert(ge_info->sortgroupref > 0); + + return ge_info->sortgroupref; + } + } + + /* The expression cannot be used as a grouping key. */ + return 0; +} diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 79ecaa4c4c..d3d86a108a 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -929,6 +929,16 @@ struct config_bool ConfigureNamesBool[] = false, NULL, NULL, NULL }, + { + {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD, + gettext_noop("Enables eager aggregation."), + NULL, + GUC_EXPLAIN + }, + &enable_eager_aggregate, + false, + NULL, NULL, NULL + }, { {"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD, gettext_noop("Enables the planner's use of parallel append plans."), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 667e0dc40a..2e9df56cf4 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -413,6 +413,7 @@ #enable_sort = on #enable_tidscan = on #enable_group_by_reordering = on +#enable_eager_aggregate = off # - Planner Cost Constants - diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 1951ae7c11..815c14c71d 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -387,6 +387,15 @@ struct PlannerInfo /* list of PlaceHolderInfos */ List *placeholder_list; + /* list of AggClauseInfos */ + List *agg_clause_list; + + /* list of GroupExprInfos */ + List *group_expr_list; + + /* list of plain Vars contained in targetlist and havingQual */ + List *tlist_vars; + /* array of PlaceHolderInfos indexed by phid */ struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size)); /* allocated size of array */ @@ -429,6 +438,12 @@ struct PlannerInfo */ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); + /* + * list of grouped-relation RelAggInfos, with one instance per item of the + * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list. + */ + RelInfoList *agg_info_list; + /* Result tlists chosen by grouping_planner for upper-stage processing */ struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); @@ -1079,6 +1094,56 @@ typedef struct RelOptInfo ((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \ (rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs) +/* + * RelAggInfo + * Information needed to create grouped paths for base and join rels. + * + * "relids" is the set of relation identifiers (RT indexes). + * + * "target" is the output tlist for the grouped paths. + * + * "agg_input" is the output tlist for the paths that provide input to the + * grouped paths. One difference from the reltarget of the non-grouped + * relation is that agg_input has its sortgrouprefs[] initialized. + * + * "grouped_rows" is the estimated number of result tuples of the grouped + * relation. + * + * "group_clauses", "group_exprs" and "group_pathkeys" are lists of + * SortGroupClauses, the corresponding grouping expressions and PathKeys + * respectively. + */ +typedef struct RelAggInfo +{ + pg_node_attr(no_copy_equal, no_read, no_query_jumble) + + NodeTag type; + + /* set of base + OJ relids (rangetable indexes) */ + Relids relids; + + /* + * default result targetlist for Paths scanning this grouped relation; + * list of Vars/Exprs, cost, width + */ + struct PathTarget *target; + + /* + * the targetlist for Paths that provide input to the grouped paths + */ + struct PathTarget *agg_input; + + /* estimated number of result tuples */ + Cardinality grouped_rows; + + /* a list of SortGroupClauses */ + List *group_clauses; + /* a list of grouping expressions */ + List *group_exprs; + /* a list of PathKeys */ + List *group_pathkeys; +} RelAggInfo; + /* * IndexOptInfo * Per-index information for planning/optimization @@ -3147,6 +3212,41 @@ typedef struct MinMaxAggInfo Param *param; } MinMaxAggInfo; +/* + * The aggregate expressions that appear in targetlist and having clauses + */ +typedef struct AggClauseInfo +{ + pg_node_attr(no_read, no_query_jumble) + + NodeTag type; + + /* the Aggref expr */ + Aggref *aggref; + + /* lowest level we can evaluate this aggregate at */ + Relids agg_eval_at; +} AggClauseInfo; + +/* + * The grouping expressions that appear in grouping clauses + */ +typedef struct GroupExprInfo +{ + pg_node_attr(no_read, no_query_jumble) + + NodeTag type; + + /* the represented expression */ + Expr *expr; + + /* the tleSortGroupRef of the corresponding SortGroupClause */ + Index sortgroupref; + + /* btree opfamily defining the ordering */ + Oid btree_opfamily; +} GroupExprInfo; + /* * At runtime, PARAM_EXEC slots are used to pass values around from one plan * node to another. They can be used to pass values down into subqueries (for diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index f00bd55f39..d5282e916b 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -310,10 +310,18 @@ extern void setup_simple_rel_arrays(PlannerInfo *root); extern void expand_planner_arrays(PlannerInfo *root, int add_size); extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent); +extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid, + RelAggInfo **agg_info_p); +extern RelOptInfo *build_grouped_rel(PlannerInfo *root, + RelOptInfo *rel_plain); extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid); extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid); extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid); extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids); +extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, + RelAggInfo *agg_info); +extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids, + RelAggInfo **agg_info_p); extern RelOptInfo *build_join_rel(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer_rel, @@ -349,4 +357,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root, SpecialJoinInfo *sjinfo, int nappinfos, AppendRelInfo **appinfos); +extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel); #endif /* PATHNODE_H */ diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index 970499c469..9392a27a4d 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -21,6 +21,7 @@ * allpaths.c */ extern PGDLLIMPORT bool enable_geqo; +extern PGDLLIMPORT bool enable_eager_aggregate; extern PGDLLIMPORT int geqo_threshold; extern PGDLLIMPORT int min_parallel_table_scan_size; extern PGDLLIMPORT int min_parallel_index_scan_size; @@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows); extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows); +extern void generate_grouped_paths(PlannerInfo *root, + RelOptInfo *rel_grouped, + RelOptInfo *rel_plain, + RelAggInfo *agg_info); extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages, int max_workers); extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel, diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index aafc173792..cedcd88ebf 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -72,6 +72,7 @@ extern void add_other_rels_to_query(PlannerInfo *root); extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist); extern void add_vars_to_targetlist(PlannerInfo *root, List *vars, Relids where_needed); +extern void setup_eager_aggregation(PlannerInfo *root); extern void find_lateral_references(PlannerInfo *root); extern void create_lateral_join_info(PlannerInfo *root); extern List *deconstruct_jointree(PlannerInfo *root); diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out new file mode 100644 index 0000000000..03ff11f8e0 --- /dev/null +++ b/src/test/regress/expected/eager_aggregate.out @@ -0,0 +1,1293 @@ +-- +-- EAGER AGGREGATION +-- Test we can push aggregation down below join +-- +-- Enable eager aggregation, which by default is disabled. +SET enable_eager_aggregate TO on; +CREATE TABLE eager_agg_t1 (a int, b int, c double precision); +CREATE TABLE eager_agg_t2 (a int, b int, c double precision); +CREATE TABLE eager_agg_t3 (a int, b int, c double precision); +INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i; +ANALYZE eager_agg_t1; +ANALYZE eager_agg_t2; +ANALYZE eager_agg_t3; +-- +-- Test eager aggregation over base rel +-- +-- Perform scan of a table, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c +(18 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 +(9 rows) + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial GroupAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Sort + Output: t2.c, t2.b + Sort Key: t2.b + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.c, t2.b +(21 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 +(9 rows) + +RESET enable_hashagg; +-- +-- Test eager aggregation over join rel +-- +-- Perform join of tables, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg((t2.c + t3.c)) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg((t2.c + t3.c))) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg((t2.c + t3.c)) + Group Key: t2.b + -> Hash Join + Output: t2.c, t3.c, t2.b + Hash Cond: (t3.a = t2.a) + -> Seq Scan on public.eager_agg_t3 t3 + Output: t3.a, t3.b, t3.c + -> Hash + Output: t2.c, t2.b, t2.a + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.c, t2.b, t2.a +(25 rows) + +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 497 + 2 | 499 + 3 | 501 + 4 | 503 + 5 | 505 + 6 | 507 + 7 | 509 + 8 | 511 + 9 | 513 +(9 rows) + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg((t2.c + t3.c)) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg((t2.c + t3.c))) + -> Partial GroupAggregate + Output: t2.b, PARTIAL avg((t2.c + t3.c)) + Group Key: t2.b + -> Sort + Output: t2.c, t3.c, t2.b + Sort Key: t2.b + -> Hash Join + Output: t2.c, t3.c, t2.b + Hash Cond: (t3.a = t2.a) + -> Seq Scan on public.eager_agg_t3 t3 + Output: t3.a, t3.b, t3.c + -> Hash + Output: t2.c, t2.b, t2.a + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.c, t2.b, t2.a +(28 rows) + +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 497 + 2 | 499 + 3 | 501 + 4 | 503 + 5 | 505 + 6 | 507 + 7 | 509 + 8 | 511 + 9 | 513 +(9 rows) + +RESET enable_hashagg; +-- +-- Test that eager aggregation works for outer join +-- +-- Ensure aggregation can be pushed down to the non-nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Hash Right Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c +(18 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 + | 505 +(10 rows) + +-- Ensure aggregation cannot be pushed down to the nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; + QUERY PLAN +------------------------------------------------------------ + Sort + Output: t2.b, (avg(t2.c)) + Sort Key: t2.b + -> HashAggregate + Output: t2.b, avg(t2.c) + Group Key: t2.b + -> Hash Right Join + Output: t2.b, t2.c + Hash Cond: (t2.b = t1.b) + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c + -> Hash + Output: t1.b + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.b +(15 rows) + +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; + b | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 + | +(10 rows) + +-- +-- Test that eager aggregation works for parallel plans +-- +SET parallel_setup_cost=0; +SET parallel_tuple_cost=0; +SET min_parallel_table_scan_size=0; +SET max_parallel_workers_per_gather=4; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +--------------------------------------------------------------------------------- + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Gather Merge + Output: t1.a, (PARTIAL avg(t2.c)) + Workers Planned: 2 + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Parallel Hash Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Parallel Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Parallel Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Parallel Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c +(21 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 +(9 rows) + +RESET parallel_setup_cost; +RESET parallel_tuple_cost; +RESET min_parallel_table_scan_size; +RESET max_parallel_workers_per_gather; +DROP TABLE eager_agg_t1; +DROP TABLE eager_agg_t2; +DROP TABLE eager_agg_t3; +-- +-- Test eager aggregation for partitionwise join +-- +-- Enable partitionwise aggregate, which by default is disabled. +SET enable_partitionwise_aggregate TO true; +-- Enable partitionwise join, which by default is disabled. +SET enable_partitionwise_join TO true; +CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30); +CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y); +CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30); +INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i; +INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i; +ANALYZE eager_agg_tab1; +ANALYZE eager_agg_tab2; +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum(t1.y)), (count(*)) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum(t1.y), count(*) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + Hash Cond: (t2.y = t1.x) + -> Seq Scan on public.eager_agg_tab2_p1 t2 + Output: t2.y + -> Hash + Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*) + Group Key: t1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1 + Output: t1.x, t1.y + -> Finalize HashAggregate + Output: t1_1.x, sum(t1_1.y), count(*) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + Hash Cond: (t2_1.y = t1_1.x) + -> Seq Scan on public.eager_agg_tab2_p2 t2_1 + Output: t2_1.y + -> Hash + Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*) + Group Key: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_1 + Output: t1_1.x, t1_1.y + -> Finalize HashAggregate + Output: t1_2.x, sum(t1_2.y), count(*) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + Hash Cond: (t2_2.y = t1_2.x) + -> Seq Scan on public.eager_agg_tab2_p3 t2_2 + Output: t2_2.y + -> Hash + Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*) + Group Key: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_2 + Output: t1_2.x, t1_2.y +(49 rows) + +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; + x | sum | count +----+------+------- + 0 | 500 | 100 + 6 | 1100 | 100 + 12 | 700 | 100 + 18 | 1300 | 100 + 24 | 900 | 100 +(5 rows) + +-- GROUP BY having other matching key +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t2.y, (sum(t1.y)), (count(*)) + Sort Key: t2.y + -> Append + -> Finalize HashAggregate + Output: t2.y, sum(t1.y), count(*) + Group Key: t2.y + -> Hash Join + Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + Hash Cond: (t2.y = t1.x) + -> Seq Scan on public.eager_agg_tab2_p1 t2 + Output: t2.y + -> Hash + Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*) + Group Key: t1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1 + Output: t1.y, t1.x + -> Finalize HashAggregate + Output: t2_1.y, sum(t1_1.y), count(*) + Group Key: t2_1.y + -> Hash Join + Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + Hash Cond: (t2_1.y = t1_1.x) + -> Seq Scan on public.eager_agg_tab2_p2 t2_1 + Output: t2_1.y + -> Hash + Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*) + Group Key: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_1 + Output: t1_1.y, t1_1.x + -> Finalize HashAggregate + Output: t2_2.y, sum(t1_2.y), count(*) + Group Key: t2_2.y + -> Hash Join + Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + Hash Cond: (t2_2.y = t1_2.x) + -> Seq Scan on public.eager_agg_tab2_p3 t2_2 + Output: t2_2.y + -> Hash + Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*) + Group Key: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_2 + Output: t1_2.y, t1_2.x +(49 rows) + +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; + y | sum | count +----+------+------- + 0 | 500 | 100 + 6 | 1100 | 100 + 12 | 700 | 100 + 18 | 1300 | 100 + 24 | 900 | 100 +(5 rows) + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; + QUERY PLAN +------------------------------------------------------------------------------------------------------------ + Sort + Output: t2.x, (sum(t1.x)), (count(*)) + Sort Key: t2.x + -> Finalize HashAggregate + Output: t2.x, sum(t1.x), count(*) + Group Key: t2.x + Filter: (avg(t1.x) > '10'::numeric) + -> Append + -> Hash Join + Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x)) + Hash Cond: (t2_1.y = t1_1.x) + -> Seq Scan on public.eager_agg_tab2_p1 t2_1 + Output: t2_1.x, t2_1.y + -> Hash + Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x)) + -> Partial HashAggregate + Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x) + Group Key: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1_1 + Output: t1_1.x + -> Hash Join + Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x)) + Hash Cond: (t2_2.y = t1_2.x) + -> Seq Scan on public.eager_agg_tab2_p2 t2_2 + Output: t2_2.x, t2_2.y + -> Hash + Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x)) + -> Partial HashAggregate + Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x) + Group Key: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_2 + Output: t1_2.x + -> Hash Join + Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x)) + Hash Cond: (t2_3.y = t1_3.x) + -> Seq Scan on public.eager_agg_tab2_p3 t2_3 + Output: t2_3.x, t2_3.y + -> Hash + Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x)) + -> Partial HashAggregate + Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x) + Group Key: t1_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_3 + Output: t1_3.x +(44 rows) + +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; + x | sum | count +----+------+------- + 2 | 600 | 50 + 4 | 1200 | 50 + 8 | 900 | 50 + 12 | 600 | 50 + 14 | 1200 | 50 + 18 | 900 | 50 +(6 rows) + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +------------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum((t2.y + t3.y))) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum((t2.y + t3.y)) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum((t2.y + t3.y))) + Hash Cond: (t1.x = t2.x) + -> Seq Scan on public.eager_agg_tab1_p1 t1 + Output: t1.x + -> Hash + Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))) + -> Partial HashAggregate + Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)) + Group Key: t2.x + -> Hash Join + Output: t2.y, t2.x, t3.y, t3.x + Hash Cond: (t2.x = t3.x) + -> Seq Scan on public.eager_agg_tab1_p1 t2 + Output: t2.y, t2.x + -> Hash + Output: t3.y, t3.x + -> Seq Scan on public.eager_agg_tab1_p1 t3 + Output: t3.y, t3.x + -> Finalize HashAggregate + Output: t1_1.x, sum((t2_1.y + t3_1.y)) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab1_p2 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))) + -> Partial HashAggregate + Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)) + Group Key: t2_1.x + -> Hash Join + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab1_p2 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab1_p2 t3_1 + Output: t3_1.y, t3_1.x + -> Finalize HashAggregate + Output: t1_2.x, sum((t2_2.y + t3_2.y)) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab1_p3 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))) + -> Partial HashAggregate + Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)) + Group Key: t2_2.x + -> Hash Join + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab1_p3 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab1_p3 t3_2 + Output: t3_2.y, t3_2.x +(70 rows) + +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + x | sum +----+------- + 0 | 10000 + 2 | 14000 + 4 | 18000 + 6 | 22000 + 8 | 26000 + 10 | 10000 + 12 | 14000 + 14 | 18000 + 16 | 22000 + 18 | 26000 + 20 | 10000 + 22 | 14000 + 24 | 18000 + 26 | 22000 + 28 | 26000 +(15 rows) + +-- partial aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + QUERY PLAN +------------------------------------------------------------------------------------------- + Finalize GroupAggregate + Output: t3.y, sum((t2.y + t3.y)) + Group Key: t3.y + -> Sort + Output: t3.y, (PARTIAL sum((t2.y + t3.y))) + Sort Key: t3.y + -> Append + -> Hash Join + Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))) + Hash Cond: (t2_1.x = t1_1.x) + -> Partial GroupAggregate + Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)) + Group Key: t3_1.y, t2_1.x, t3_1.x + -> Sort + Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x + Sort Key: t3_1.y, t2_1.x + -> Hash Join + Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab1_p1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t3_1 + Output: t3_1.y, t3_1.x + -> Hash + Output: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1_1 + Output: t1_1.x + -> Hash Join + Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))) + Hash Cond: (t2_2.x = t1_2.x) + -> Partial GroupAggregate + Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)) + Group Key: t3_2.y, t2_2.x, t3_2.x + -> Sort + Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x + Sort Key: t3_2.y, t2_2.x + -> Hash Join + Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab1_p2 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t3_2 + Output: t3_2.y, t3_2.x + -> Hash + Output: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_2 + Output: t1_2.x + -> Hash Join + Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))) + Hash Cond: (t2_3.x = t1_3.x) + -> Partial GroupAggregate + Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)) + Group Key: t3_3.y, t2_3.x, t3_3.x + -> Sort + Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x + Sort Key: t3_3.y, t2_3.x + -> Hash Join + Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x + Hash Cond: (t2_3.x = t3_3.x) + -> Seq Scan on public.eager_agg_tab1_p3 t2_3 + Output: t2_3.y, t2_3.x + -> Hash + Output: t3_3.y, t3_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t3_3 + Output: t3_3.y, t3_3.x + -> Hash + Output: t1_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_3 + Output: t1_3.x +(73 rows) + +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + y | sum +----+------- + 0 | 7500 + 2 | 13500 + 4 | 19500 + 6 | 25500 + 8 | 31500 + 10 | 22500 + 12 | 28500 + 14 | 34500 + 16 | 40500 + 18 | 46500 +(10 rows) + +RESET enable_hashagg; +DROP TABLE eager_agg_tab1; +DROP TABLE eager_agg_tab2; +-- +-- Test with multi-level partitioning scheme +-- +CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15); +CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20); +CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25); +CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30); +INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i; +ANALYZE eager_agg_tab_ml; +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum(t2.y)), (count(*)) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum(t2.y), count(*) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*)) + Hash Cond: (t1.x = t2.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1 + Output: t1.x + -> Hash + Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*) + Group Key: t2.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t2 + Output: t2.y, t2.x + -> Finalize HashAggregate + Output: t1_1.x, sum(t2_1.y), count(*) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*) + Group Key: t2_1.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1 + Output: t2_1.y, t2_1.x + -> Finalize HashAggregate + Output: t1_2.x, sum(t2_2.y), count(*) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*) + Group Key: t2_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2 + Output: t2_2.y, t2_2.x + -> Finalize HashAggregate + Output: t1_3.x, sum(t2_3.y), count(*) + Group Key: t1_3.x + -> Hash Join + Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3 + Output: t1_3.x + -> Hash + Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*) + Group Key: t2_3.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3 + Output: t2_3.y, t2_3.x + -> Finalize HashAggregate + Output: t1_4.x, sum(t2_4.y), count(*) + Group Key: t1_4.x + -> Hash Join + Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4 + Output: t1_4.x + -> Hash + Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*) + Group Key: t2_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4 + Output: t2_4.y, t2_4.x +(79 rows) + +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; + x | sum | count +----+-------+------- + 0 | 0 | 1089 + 1 | 1156 | 1156 + 2 | 2312 | 1156 + 3 | 3468 | 1156 + 4 | 4624 | 1156 + 5 | 5780 | 1156 + 6 | 6936 | 1156 + 7 | 8092 | 1156 + 8 | 9248 | 1156 + 9 | 10404 | 1156 + 10 | 11560 | 1156 + 11 | 11979 | 1089 + 12 | 13068 | 1089 + 13 | 14157 | 1089 + 14 | 15246 | 1089 + 15 | 16335 | 1089 + 16 | 17424 | 1089 + 17 | 18513 | 1089 + 18 | 19602 | 1089 + 19 | 20691 | 1089 + 20 | 21780 | 1089 + 21 | 22869 | 1089 + 22 | 23958 | 1089 + 23 | 25047 | 1089 + 24 | 26136 | 1089 + 25 | 27225 | 1089 + 26 | 28314 | 1089 + 27 | 29403 | 1089 + 28 | 30492 | 1089 + 29 | 31581 | 1089 +(30 rows) + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t1.y, (sum(t2.y)), (count(*)) + Sort Key: t1.y + -> Finalize HashAggregate + Output: t1.y, sum(t2.y), count(*) + Group Key: t1.y + -> Append + -> Hash Join + Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1 + Output: t1_1.y, t1_1.x + -> Hash + Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*) + Group Key: t2_1.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash Join + Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2 + Output: t1_2.y, t1_2.x + -> Hash + Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*) + Group Key: t2_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2 + Output: t2_2.y, t2_2.x + -> Hash Join + Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3 + Output: t1_3.y, t1_3.x + -> Hash + Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*) + Group Key: t2_3.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3 + Output: t2_3.y, t2_3.x + -> Hash Join + Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4 + Output: t1_4.y, t1_4.x + -> Hash + Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*) + Group Key: t2_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4 + Output: t2_4.y, t2_4.x + -> Hash Join + Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*)) + Hash Cond: (t1_5.x = t2_5.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5 + Output: t1_5.y, t1_5.x + -> Hash + Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*) + Group Key: t2_5.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5 + Output: t2_5.y, t2_5.x +(67 rows) + +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; + y | sum | count +----+-------+------- + 0 | 0 | 1089 + 1 | 1156 | 1156 + 2 | 2312 | 1156 + 3 | 3468 | 1156 + 4 | 4624 | 1156 + 5 | 5780 | 1156 + 6 | 6936 | 1156 + 7 | 8092 | 1156 + 8 | 9248 | 1156 + 9 | 10404 | 1156 + 10 | 11560 | 1156 + 11 | 11979 | 1089 + 12 | 13068 | 1089 + 13 | 14157 | 1089 + 14 | 15246 | 1089 + 15 | 16335 | 1089 + 16 | 17424 | 1089 + 17 | 18513 | 1089 + 18 | 19602 | 1089 + 19 | 20691 | 1089 + 20 | 21780 | 1089 + 21 | 22869 | 1089 + 22 | 23958 | 1089 + 23 | 25047 | 1089 + 24 | 26136 | 1089 + 25 | 27225 | 1089 + 26 | 28314 | 1089 + 27 | 29403 | 1089 + 28 | 30492 | 1089 + 29 | 31581 | 1089 +(30 rows) + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +---------------------------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum((t2.y + t3.y))), (count(*)) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum((t2.y + t3.y)), count(*) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*)) + Hash Cond: (t1.x = t2.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1 + Output: t1.x + -> Hash + Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*) + Group Key: t2.x + -> Hash Join + Output: t2.y, t2.x, t3.y, t3.x + Hash Cond: (t2.x = t3.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t2 + Output: t2.y, t2.x + -> Hash + Output: t3.y, t3.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t3 + Output: t3.y, t3.x + -> Finalize HashAggregate + Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*) + Group Key: t2_1.x + -> Hash Join + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1 + Output: t3_1.y, t3_1.x + -> Finalize HashAggregate + Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*) + Group Key: t2_2.x + -> Hash Join + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2 + Output: t3_2.y, t3_2.x + -> Finalize HashAggregate + Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*) + Group Key: t1_3.x + -> Hash Join + Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3 + Output: t1_3.x + -> Hash + Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*) + Group Key: t2_3.x + -> Hash Join + Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x + Hash Cond: (t2_3.x = t3_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3 + Output: t2_3.y, t2_3.x + -> Hash + Output: t3_3.y, t3_3.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3 + Output: t3_3.y, t3_3.x + -> Finalize HashAggregate + Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*) + Group Key: t1_4.x + -> Hash Join + Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4 + Output: t1_4.x + -> Hash + Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*) + Group Key: t2_4.x + -> Hash Join + Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x + Hash Cond: (t2_4.x = t3_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4 + Output: t2_4.y, t2_4.x + -> Hash + Output: t3_4.y, t3_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4 + Output: t3_4.y, t3_4.x +(114 rows) + +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + x | sum | count +----+---------+------- + 0 | 0 | 35937 + 1 | 78608 | 39304 + 2 | 157216 | 39304 + 3 | 235824 | 39304 + 4 | 314432 | 39304 + 5 | 393040 | 39304 + 6 | 471648 | 39304 + 7 | 550256 | 39304 + 8 | 628864 | 39304 + 9 | 707472 | 39304 + 10 | 786080 | 39304 + 11 | 790614 | 35937 + 12 | 862488 | 35937 + 13 | 934362 | 35937 + 14 | 1006236 | 35937 + 15 | 1078110 | 35937 + 16 | 1149984 | 35937 + 17 | 1221858 | 35937 + 18 | 1293732 | 35937 + 19 | 1365606 | 35937 + 20 | 1437480 | 35937 + 21 | 1509354 | 35937 + 22 | 1581228 | 35937 + 23 | 1653102 | 35937 + 24 | 1724976 | 35937 + 25 | 1796850 | 35937 + 26 | 1868724 | 35937 + 27 | 1940598 | 35937 + 28 | 2012472 | 35937 + 29 | 2084346 | 35937 +(30 rows) + +-- partial aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + QUERY PLAN +------------------------------------------------------------------------------------------------------------------ + Sort + Output: t3.y, (sum((t2.y + t3.y))), (count(*)) + Sort Key: t3.y + -> Finalize HashAggregate + Output: t3.y, sum((t2.y + t3.y)), count(*) + Group Key: t3.y + -> Append + -> Hash Join + Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1 + Output: t1_1.x + -> Hash + Output: t3_1.y, t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t3_1.y, t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*) + Group Key: t3_1.y, t2_1.x, t3_1.x + -> Hash Join + Output: t2_1.y, t3_1.y, t2_1.x, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1 + Output: t3_1.y, t3_1.x + -> Hash Join + Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2 + Output: t1_2.x + -> Hash + Output: t3_2.y, t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t3_2.y, t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*) + Group Key: t3_2.y, t2_2.x, t3_2.x + -> Hash Join + Output: t2_2.y, t3_2.y, t2_2.x, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2 + Output: t3_2.y, t3_2.x + -> Hash Join + Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3 + Output: t1_3.x + -> Hash + Output: t3_3.y, t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t3_3.y, t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*) + Group Key: t3_3.y, t2_3.x, t3_3.x + -> Hash Join + Output: t2_3.y, t3_3.y, t2_3.x, t3_3.x + Hash Cond: (t2_3.x = t3_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3 + Output: t2_3.y, t2_3.x + -> Hash + Output: t3_3.y, t3_3.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3 + Output: t3_3.y, t3_3.x + -> Hash Join + Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4 + Output: t1_4.x + -> Hash + Output: t3_4.y, t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t3_4.y, t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*) + Group Key: t3_4.y, t2_4.x, t3_4.x + -> Hash Join + Output: t2_4.y, t3_4.y, t2_4.x, t3_4.x + Hash Cond: (t2_4.x = t3_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4 + Output: t2_4.y, t2_4.x + -> Hash + Output: t3_4.y, t3_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4 + Output: t3_4.y, t3_4.x + -> Hash Join + Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*)) + Hash Cond: (t1_5.x = t2_5.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5 + Output: t1_5.x + -> Hash + Output: t3_5.y, t2_5.x, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t3_5.y, t2_5.x, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*) + Group Key: t3_5.y, t2_5.x, t3_5.x + -> Hash Join + Output: t2_5.y, t3_5.y, t2_5.x, t3_5.x + Hash Cond: (t2_5.x = t3_5.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5 + Output: t2_5.y, t2_5.x + -> Hash + Output: t3_5.y, t3_5.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5 + Output: t3_5.y, t3_5.x +(102 rows) + +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + y | sum | count +----+---------+------- + 0 | 0 | 35937 + 1 | 78608 | 39304 + 2 | 157216 | 39304 + 3 | 235824 | 39304 + 4 | 314432 | 39304 + 5 | 393040 | 39304 + 6 | 471648 | 39304 + 7 | 550256 | 39304 + 8 | 628864 | 39304 + 9 | 707472 | 39304 + 10 | 786080 | 39304 + 11 | 790614 | 35937 + 12 | 862488 | 35937 + 13 | 934362 | 35937 + 14 | 1006236 | 35937 + 15 | 1078110 | 35937 + 16 | 1149984 | 35937 + 17 | 1221858 | 35937 + 18 | 1293732 | 35937 + 19 | 1365606 | 35937 + 20 | 1437480 | 35937 + 21 | 1509354 | 35937 + 22 | 1581228 | 35937 + 23 | 1653102 | 35937 + 24 | 1724976 | 35937 + 25 | 1796850 | 35937 + 26 | 1868724 | 35937 + 27 | 1940598 | 35937 + 28 | 2012472 | 35937 + 29 | 2084346 | 35937 +(30 rows) + +DROP TABLE eager_agg_tab_ml; diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out index fad7fc3a7e..1dda69e7c2 100644 --- a/src/test/regress/expected/sysviews.out +++ b/src/test/regress/expected/sysviews.out @@ -150,6 +150,7 @@ select name, setting from pg_settings where name like 'enable%'; --------------------------------+--------- enable_async_append | on enable_bitmapscan | on + enable_eager_aggregate | off enable_gathermerge | on enable_group_by_reordering | on enable_hashagg | on @@ -170,7 +171,7 @@ select name, setting from pg_settings where name like 'enable%'; enable_seqscan | on enable_sort | on enable_tidscan | on -(22 rows) +(23 rows) -- There are always wait event descriptions for various types. InjectionPoint -- may be present or absent, depending on history since last postmaster start. diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 2429ec2bba..d5697e5655 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr # The stats test resets stats, so nothing else needing stats access can be in # this group. # ---------- -test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate +test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate # event_trigger depends on create_am and cannot run concurrently with # any test that runs DDL diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql new file mode 100644 index 0000000000..4050e4df44 --- /dev/null +++ b/src/test/regress/sql/eager_aggregate.sql @@ -0,0 +1,192 @@ +-- +-- EAGER AGGREGATION +-- Test we can push aggregation down below join +-- + +-- Enable eager aggregation, which by default is disabled. +SET enable_eager_aggregate TO on; + +CREATE TABLE eager_agg_t1 (a int, b int, c double precision); +CREATE TABLE eager_agg_t2 (a int, b int, c double precision); +CREATE TABLE eager_agg_t3 (a int, b int, c double precision); + +INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i; + +ANALYZE eager_agg_t1; +ANALYZE eager_agg_t2; +ANALYZE eager_agg_t3; + + +-- +-- Test eager aggregation over base rel +-- + +-- Perform scan of a table, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +RESET enable_hashagg; + + +-- +-- Test eager aggregation over join rel +-- + +-- Perform join of tables, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + +RESET enable_hashagg; + + +-- +-- Test that eager aggregation works for outer join +-- + +-- Ensure aggregation can be pushed down to the non-nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +-- Ensure aggregation cannot be pushed down to the nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; + + +-- +-- Test that eager aggregation works for parallel plans +-- + +SET parallel_setup_cost=0; +SET parallel_tuple_cost=0; +SET min_parallel_table_scan_size=0; +SET max_parallel_workers_per_gather=4; + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +RESET parallel_setup_cost; +RESET parallel_tuple_cost; +RESET min_parallel_table_scan_size; +RESET max_parallel_workers_per_gather; + + +DROP TABLE eager_agg_t1; +DROP TABLE eager_agg_t2; +DROP TABLE eager_agg_t3; + + +-- +-- Test eager aggregation for partitionwise join +-- + +-- Enable partitionwise aggregate, which by default is disabled. +SET enable_partitionwise_aggregate TO true; +-- Enable partitionwise join, which by default is disabled. +SET enable_partitionwise_join TO true; + +CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30); +CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y); +CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30); +INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i; +INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i; + +ANALYZE eager_agg_tab1; +ANALYZE eager_agg_tab2; + +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; + +-- GROUP BY having other matching key +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + +-- partial aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; +RESET enable_hashagg; + +DROP TABLE eager_agg_tab1; +DROP TABLE eager_agg_tab2; + + +-- +-- Test with multi-level partitioning scheme +-- +CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15); +CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20); +CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25); +CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30); +INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i; + +ANALYZE eager_agg_tab_ml; + +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + +-- partial aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + +DROP TABLE eager_agg_tab_ml; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 5255160212..347c82fe1a 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -41,6 +41,7 @@ AfterTriggersTableData AfterTriggersTransData Agg AggClauseCosts +AggClauseInfo AggInfo AggPath AggSplit @@ -1060,6 +1061,7 @@ GrantTargetType Group GroupByOrdering GroupClause +GroupExprInfo GroupPath GroupPathExtraData GroupResultPath @@ -2370,6 +2372,7 @@ ReindexObjectType ReindexParams ReindexStmt ReindexType +RelAggInfo RelFileLocator RelFileLocatorBackend RelFileNumber @@ -2378,6 +2381,7 @@ RelInfo RelInfoArr RelInfoEntry RelInfoList +RelInfoListInfo RelMapFile RelMapping RelOptInfo -- 2.43.0 ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-08-21 07:10 Richard Guo <[email protected]> parent: Richard Guo <[email protected]> 0 siblings, 5 replies; 30+ messages in thread From: Richard Guo @ 2024-08-21 07:10 UTC (permalink / raw) To: Paul George <[email protected]>; +Cc: Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Fri, Aug 16, 2024 at 4:14 PM Richard Guo <[email protected]> wrote: > I had a self-review of this patchset and made some refactoring, > especially to the function that creates the RelAggInfo structure for a > given relation. While there were no major changes, the code should > now be simpler. I found a bug in v10 patchset: when we generate the GROUP BY clauses for the partial aggregation that is pushed down to a non-aggregated relation, we may produce a clause with a tleSortGroupRef that duplicates one already present in the query's groupClause, which would cause problems. Attached is the updated version of the patchset that fixes this bug and includes further code refactoring. Thanks Richard Attachments: [application/octet-stream] v11-0001-Introduce-RelInfoList-structure.patch (14.8K, 2-v11-0001-Introduce-RelInfoList-structure.patch) download | inline diff: From 7ebfb64cd28e0ada3cc62065bfe8e9cdf685b497 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 15:59:19 +0900 Subject: [PATCH v11 1/2] Introduce RelInfoList structure This commit introduces the RelInfoList structure, which encapsulates both a list and a hash table, so that we can leverage the hash table for faster lookups not only for join relations but also for upper relations. --- contrib/postgres_fdw/postgres_fdw.c | 3 +- src/backend/optimizer/geqo/geqo_eval.c | 20 +-- src/backend/optimizer/path/allpaths.c | 7 +- src/backend/optimizer/plan/planmain.c | 5 +- src/backend/optimizer/util/relnode.c | 164 ++++++++++++++----------- src/include/nodes/pathnodes.h | 32 +++-- src/tools/pgindent/typedefs.list | 3 +- 7 files changed, 136 insertions(+), 98 deletions(-) diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index fc65d81e21..be4038f64f 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -6079,7 +6079,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype, */ Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */ fpinfo->relation_index = - list_length(root->parse->rtable) + list_length(root->join_rel_list); + list_length(root->parse->rtable) + + list_length(root->join_rel_list->items); return true; } diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c index d2f7f4e5f3..1141156899 100644 --- a/src/backend/optimizer/geqo/geqo_eval.c +++ b/src/backend/optimizer/geqo/geqo_eval.c @@ -85,18 +85,18 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) * truncating the list to its original length. NOTE this assumes that any * added entries are appended at the end! * - * We also must take care not to mess up the outer join_rel_hash, if there - * is one. We can do this by just temporarily setting the link to NULL. - * (If we are dealing with enough join rels, which we very likely are, a - * new hash table will get built and used locally.) + * We also must take care not to mess up the outer join_rel_list->hash, if + * there is one. We can do this by just temporarily setting the link to + * NULL. (If we are dealing with enough join rels, which we very likely + * are, a new hash table will get built and used locally.) * * join_rel_level[] shouldn't be in use, so just Assert it isn't. */ - savelength = list_length(root->join_rel_list); - savehash = root->join_rel_hash; + savelength = list_length(root->join_rel_list->items); + savehash = root->join_rel_list->hash; Assert(root->join_rel_level == NULL); - root->join_rel_hash = NULL; + root->join_rel_list->hash = NULL; /* construct the best path for the given combination of relations */ joinrel = gimme_tree(root, tour, num_gene); @@ -121,9 +121,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) * Restore join_rel_list to its former state, and put back original * hashtable if any. */ - root->join_rel_list = list_truncate(root->join_rel_list, - savelength); - root->join_rel_hash = savehash; + root->join_rel_list->items = list_truncate(root->join_rel_list->items, + savelength); + root->join_rel_list->hash = savehash; /* release all the memory acquired within gimme_tree */ MemoryContextSwitchTo(oldcxt); diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 057b4b79eb..b550e707a4 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -3410,9 +3410,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist) * needed for these paths need have been instantiated. * * Note to plugin authors: the functions invoked during standard_join_search() - * modify root->join_rel_list and root->join_rel_hash. If you want to do more - * than one join-order search, you'll probably need to save and restore the - * original states of those data structures. See geqo_eval() for an example. + * modify root->join_rel_list->items and root->join_rel_list->hash. If you + * want to do more than one join-order search, you'll probably need to save and + * restore the original states of those data structures. See geqo_eval() for + * an example. */ RelOptInfo * standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index e17d31a5c3..fd8b2b0ca3 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -64,8 +64,9 @@ query_planner(PlannerInfo *root, * NOTE: append_rel_list was set up by subquery_planner, so do not touch * here. */ - root->join_rel_list = NIL; - root->join_rel_hash = NULL; + root->join_rel_list = makeNode(RelInfoList); + root->join_rel_list->items = NIL; + root->join_rel_list->hash = NULL; root->join_rel_level = NULL; root->join_cur_level = 0; root->canon_pathkeys = NIL; diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index d7266e4cdb..76e13971f7 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -35,11 +35,15 @@ #include "utils/lsyscache.h" -typedef struct JoinHashEntry +/* + * An entry of a hash table that we use to make lookup for RelOptInfo + * structures more efficient. + */ +typedef struct RelInfoEntry { - Relids join_relids; /* hash key --- MUST BE FIRST */ - RelOptInfo *join_rel; -} JoinHashEntry; + Relids relids; /* hash key --- MUST BE FIRST */ + RelOptInfo *rel; +} RelInfoEntry; static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *input_rel, @@ -479,11 +483,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid) } /* - * build_join_rel_hash - * Construct the auxiliary hash table for join relations. + * build_rel_hash + * Construct the auxiliary hash table for relations. */ static void -build_join_rel_hash(PlannerInfo *root) +build_rel_hash(RelInfoList *list) { HTAB *hashtab; HASHCTL hash_ctl; @@ -491,47 +495,49 @@ build_join_rel_hash(PlannerInfo *root) /* Create the hash table */ hash_ctl.keysize = sizeof(Relids); - hash_ctl.entrysize = sizeof(JoinHashEntry); + hash_ctl.entrysize = sizeof(RelInfoEntry); hash_ctl.hash = bitmap_hash; hash_ctl.match = bitmap_match; hash_ctl.hcxt = CurrentMemoryContext; - hashtab = hash_create("JoinRelHashTable", + hashtab = hash_create("RelHashTable", 256L, &hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); - /* Insert all the already-existing joinrels */ - foreach(l, root->join_rel_list) + /* Insert all the already-existing relations */ + foreach(l, list->items) { RelOptInfo *rel = (RelOptInfo *) lfirst(l); - JoinHashEntry *hentry; + RelInfoEntry *hentry; bool found; - hentry = (JoinHashEntry *) hash_search(hashtab, - &(rel->relids), - HASH_ENTER, - &found); + hentry = (RelInfoEntry *) hash_search(hashtab, + &(rel->relids), + HASH_ENTER, + &found); Assert(!found); - hentry->join_rel = rel; + hentry->rel = rel; } - root->join_rel_hash = hashtab; + list->hash = hashtab; } /* - * find_join_rel - * Returns relation entry corresponding to 'relids' (a set of RT indexes), - * or NULL if none exists. This is for join relations. + * find_rel_info + * Find an RelOptInfo entry. */ -RelOptInfo * -find_join_rel(PlannerInfo *root, Relids relids) +static RelOptInfo * +find_rel_info(RelInfoList *list, Relids relids) { + if (list == NULL) + return NULL; + /* * Switch to using hash lookup when list grows "too long". The threshold * is arbitrary and is known only here. */ - if (!root->join_rel_hash && list_length(root->join_rel_list) > 32) - build_join_rel_hash(root); + if (!list->hash && list_length(list->items) > 32) + build_rel_hash(list); /* * Use either hashtable lookup or linear search, as appropriate. @@ -541,23 +547,23 @@ find_join_rel(PlannerInfo *root, Relids relids) * so would force relids out of a register and thus probably slow down the * list-search case. */ - if (root->join_rel_hash) + if (list->hash) { Relids hashkey = relids; - JoinHashEntry *hentry; + RelInfoEntry *hentry; - hentry = (JoinHashEntry *) hash_search(root->join_rel_hash, - &hashkey, - HASH_FIND, - NULL); + hentry = (RelInfoEntry *) hash_search(list->hash, + &hashkey, + HASH_FIND, + NULL); if (hentry) - return hentry->join_rel; + return hentry->rel; } else { ListCell *l; - foreach(l, root->join_rel_list) + foreach(l, list->items) { RelOptInfo *rel = (RelOptInfo *) lfirst(l); @@ -569,6 +575,54 @@ find_join_rel(PlannerInfo *root, Relids relids) return NULL; } +/* + * find_join_rel + * Returns relation entry corresponding to 'relids' (a set of RT indexes), + * or NULL if none exists. This is for join relations. + */ +RelOptInfo * +find_join_rel(PlannerInfo *root, Relids relids) +{ + return find_rel_info(root->join_rel_list, relids); +} + +/* + * add_rel_info + * Add given relation to the given list. Also add it to the auxiliary + * hashtable if there is one. + */ +static void +add_rel_info(RelInfoList *list, RelOptInfo *rel) +{ + /* GEQO requires us to append the new relation to the end of the list! */ + list->items = lappend(list->items, rel); + + /* store it into the auxiliary hashtable if there is one. */ + if (list->hash) + { + RelInfoEntry *hentry; + bool found; + + hentry = (RelInfoEntry *) hash_search(list->hash, + &(rel->relids), + HASH_ENTER, + &found); + Assert(!found); + hentry->rel = rel; + } +} + +/* + * add_join_rel + * Add given join relation to the list of join relations in the given + * PlannerInfo. + */ +static void +add_join_rel(PlannerInfo *root, RelOptInfo *joinrel) +{ + add_rel_info(root->join_rel_list, joinrel); +} + /* * set_foreign_rel_properties * Set up foreign-join fields if outer and inner relation are foreign @@ -618,32 +672,6 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel, } } -/* - * add_join_rel - * Add given join relation to the list of join relations in the given - * PlannerInfo. Also add it to the auxiliary hashtable if there is one. - */ -static void -add_join_rel(PlannerInfo *root, RelOptInfo *joinrel) -{ - /* GEQO requires us to append the new joinrel to the end of the list! */ - root->join_rel_list = lappend(root->join_rel_list, joinrel); - - /* store it into the auxiliary hashtable if there is one. */ - if (root->join_rel_hash) - { - JoinHashEntry *hentry; - bool found; - - hentry = (JoinHashEntry *) hash_search(root->join_rel_hash, - &(joinrel->relids), - HASH_ENTER, - &found); - Assert(!found); - hentry->join_rel = joinrel; - } -} - /* * build_join_rel * Returns relation entry corresponding to the union of two given rels, @@ -1457,22 +1485,14 @@ subbuild_joinrel_joinlist(RelOptInfo *joinrel, RelOptInfo * fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids) { + RelInfoList *list = &root->upper_rels[kind]; RelOptInfo *upperrel; - ListCell *lc; - - /* - * For the moment, our indexing data structure is just a List for each - * relation kind. If we ever get so many of one kind that this stops - * working well, we can improve it. No code outside this function should - * assume anything about how to find a particular upperrel. - */ /* If we already made this upperrel for the query, return it */ - foreach(lc, root->upper_rels[kind]) + if (list) { - upperrel = (RelOptInfo *) lfirst(lc); - - if (bms_equal(upperrel->relids, relids)) + upperrel = find_rel_info(list, relids); + if (upperrel) return upperrel; } @@ -1491,7 +1511,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids) upperrel->cheapest_unique_path = NULL; upperrel->cheapest_parameterized_paths = NIL; - root->upper_rels[kind] = lappend(root->upper_rels[kind], upperrel); + add_rel_info(&root->upper_rels[kind], upperrel); return upperrel; } diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 14ccfc1ac1..1951ae7c11 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -80,6 +80,26 @@ typedef enum UpperRelationKind /* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */ } UpperRelationKind; +/* + * A structure consisting of a list and a hash table to store relation-specific + * information. + * + * For small problems we just scan the list to do lookups, but when there are + * many relations we build a hash table for faster lookups. The hash table is + * present and valid when 'hash' is not NULL. Note that we still maintain the + * list even when using the hash table for lookups; this simplifies life for + * GEQO. + */ +typedef struct RelInfoList +{ + pg_node_attr(no_copy_equal, no_read) + + NodeTag type; + + List *items; + struct HTAB *hash pg_node_attr(read_write_ignore); +} RelInfoList; + /*---------- * PlannerGlobal * Global information for planning/optimization @@ -270,15 +290,9 @@ struct PlannerInfo /* * join_rel_list is a list of all join-relation RelOptInfos we have - * considered in this planning run. For small problems we just scan the - * list to do lookups, but when there are many join relations we build a - * hash table for faster lookups. The hash table is present and valid - * when join_rel_hash is not NULL. Note that we still maintain the list - * even when using the hash table for lookups; this simplifies life for - * GEQO. + * considered in this planning run. */ - List *join_rel_list; - struct HTAB *join_rel_hash pg_node_attr(read_write_ignore); + RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */ /* * When doing a dynamic-programming-style join search, join_rel_level[k] @@ -413,7 +427,7 @@ struct PlannerInfo * Upper-rel RelOptInfos. Use fetch_upper_rel() to get any particular * upper rel. */ - List *upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); + RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); /* Result tlists chosen by grouping_planner for upper-stage processing */ struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 6d424c8918..502c748ecd 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -1292,7 +1292,6 @@ Join JoinCostWorkspace JoinDomain JoinExpr -JoinHashEntry JoinPath JoinPathExtraData JoinState @@ -2378,6 +2377,8 @@ RelFileNumber RelIdCacheEnt RelInfo RelInfoArr +RelInfoEntry +RelInfoList RelMapFile RelMapping RelOptInfo -- 2.43.0 [application/octet-stream] v11-0002-Implement-Eager-Aggregation.patch (162.0K, 3-v11-0002-Implement-Eager-Aggregation.patch) download | inline diff: From 2af51976b33edfbe7d3c28d84c270366718e4a06 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 16:01:26 +0900 Subject: [PATCH v11 2/2] Implement Eager Aggregation Eager aggregation is a query optimization technique that partially pushes aggregation past a join, and finalizes it once all the relations are joined. Eager aggregation may reduce the number of input rows to the join and thus could result in a better overall plan. A plan with eager aggregation looks like: EXPLAIN (COSTS OFF) SELECT a.i, avg(b.y) FROM a JOIN b ON a.i = b.j GROUP BY a.i; Finalize HashAggregate Group Key: a.i -> Nested Loop -> Partial HashAggregate Group Key: b.j -> Seq Scan on b -> Index Only Scan using a_pkey on a Index Cond: (i = b.j) During the construction of the join tree, we evaluate each base or join relation to determine if eager aggregation can be applied. If feasible, we create a separate RelOptInfo called a "grouped relation" and store it in root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG]. Grouped relation paths can be generated in two ways. The first method involves adding sorted and hashed partial aggregation paths on top of the non-grouped paths. To limit planning time, we only consider the cheapest or suitably-sorted non-grouped paths during this phase. Alternatively, grouped paths can be generated by joining a grouped relation with a non-grouped relation. Joining two grouped relations does not seem to be very useful and is currently not supported. For the partial aggregation that is pushed down to a non-aggregated relation, we need to consider all expressions from this relation that are involved in upper join clauses and include them in the grouping keys. This ensures that we have the correct input for the upper joins and that an aggregated row from the partial aggregation matches the other side of the join if and only if each row in the partial group does, which is crucial for maintaining correctness. One restriction is that we cannot push partial aggregation down to a relation that is in the nullable side of an outer join, because the NULL-extended rows produced by the outer join would not be available when we perform the partial aggregation, while with a non-eager-aggregation plan these rows are available for the top-level aggregation. Pushing partial aggregation in this case may result in the rows being grouped differently than expected, or produce incorrect values from the aggregate functions. If we have generated a grouped relation for the topmost join relation, we finalize its paths at the end. The final path will compete in the usual way with paths built from regular planning. Since eager aggregation can generate many upper relations of partial aggregation, we introduce a RelInfoList structure, which encapsulates both a list and a hash table, so that we can leverage the hash table for faster lookups not only for join relations but also for upper relations. Eager aggregation can use significantly more CPU time and memory than regular planning when the query involves aggregates and many joining relations. However, in some cases, the resulting plan can be much better, justifying the additional planning effort. All the same, for now, turn this feature off by default. --- src/backend/optimizer/README | 79 + src/backend/optimizer/geqo/geqo_eval.c | 104 +- src/backend/optimizer/path/allpaths.c | 441 ++++++ src/backend/optimizer/path/joinrels.c | 135 ++ src/backend/optimizer/plan/initsplan.c | 252 ++++ src/backend/optimizer/plan/planmain.c | 12 + src/backend/optimizer/plan/planner.c | 99 +- src/backend/optimizer/util/appendinfo.c | 60 + src/backend/optimizer/util/pathnode.c | 12 +- src/backend/optimizer/util/relnode.c | 737 +++++++++- src/backend/utils/misc/guc_tables.c | 10 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/nodes/pathnodes.h | 100 ++ src/include/optimizer/pathnode.h | 9 + src/include/optimizer/paths.h | 5 + src/include/optimizer/planmain.h | 1 + src/test/regress/expected/eager_aggregate.out | 1308 +++++++++++++++++ src/test/regress/expected/sysviews.out | 3 +- src/test/regress/parallel_schedule | 2 +- src/test/regress/sql/eager_aggregate.sql | 192 +++ src/tools/pgindent/typedefs.list | 4 + 21 files changed, 3467 insertions(+), 99 deletions(-) create mode 100644 src/test/regress/expected/eager_aggregate.out create mode 100644 src/test/regress/sql/eager_aggregate.sql diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README index 2ab4f3dbf3..6f79ef531e 100644 --- a/src/backend/optimizer/README +++ b/src/backend/optimizer/README @@ -1497,3 +1497,82 @@ breaking down aggregation or grouping over a partitioned relation into aggregation or grouping over its partitions is called partitionwise aggregation. Especially when the partition keys match the GROUP BY clause, this can be significantly faster than the regular method. + +Eager aggregation +----------------- + +Eager aggregation is a query optimization technique that partially pushes +aggregation past a join, and finalizes it once all the relations are joined. +Eager aggregation may reduce the number of input rows to the join and thus +could result in a better overall plan. + +For example: + + EXPLAIN (COSTS OFF) + SELECT a.i, avg(b.y) + FROM a JOIN b ON a.i = b.j + GROUP BY a.i; + + Finalize HashAggregate + Group Key: a.i + -> Nested Loop + -> Partial HashAggregate + Group Key: b.j + -> Seq Scan on b + -> Index Only Scan using a_pkey on a + Index Cond: (i = b.j) + +If the partial aggregation on table B significantly reduces the number of +input rows, the join above will be much cheaper, leading to a more efficient +final plan. + +For the partial aggregation that is pushed down to a non-aggregated relation, +we need to consider all expressions from this relation that are involved in +upper join clauses and include them in the grouping keys. This ensures that we +have the correct input for the upper joins and that an aggregated row from the +partial aggregation matches the other side of the join if and only if each row +in the partial group does, which is crucial for maintaining correctness. + +One restriction is that we cannot push partial aggregation down to a relation +that is in the nullable side of an outer join, because the NULL-extended rows +produced by the outer join would not be available when we perform the partial +aggregation, while with a non-eager-aggregation plan these rows are available +for the top-level aggregation. Pushing partial aggregation in this case may +result in the rows being grouped differently than expected, or produce +incorrect values from the aggregate functions. + +We can also apply eager aggregation to a join: + + EXPLAIN (COSTS OFF) + SELECT a.i, avg(b.y + c.z) + FROM a JOIN b ON a.i = b.j + JOIN c ON b.j = c.i + GROUP BY a.i; + + Finalize HashAggregate + Group Key: a.i + -> Nested Loop + -> Partial HashAggregate + Group Key: b.j + -> Hash Join + Hash Cond: (b.j = c.i) + -> Seq Scan on b + -> Hash + -> Seq Scan on c + -> Index Only Scan using a_pkey on a + Index Cond: (i = b.j) + +During the construction of the join tree, we evaluate each base or join +relation to determine if eager aggregation can be applied. If feasible, we +create a separate RelOptInfo called a "grouped relation" and generate grouped +paths by adding sorted and hashed partial aggregation paths on top of the +non-grouped paths. To limit planning time, we consider only the cheapest +non-grouped paths in this step. + +Another way to generate grouped paths is to join a grouped relation with a +non-grouped relation. Joining two grouped relations does not seem to be very +useful and is currently not supported. + +If we have generated a grouped relation for the topmost join relation, we need +to finalize its paths at the end. The final path will compete in the usual way +with paths built from regular planning. diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c index 1141156899..b77805d27d 100644 --- a/src/backend/optimizer/geqo/geqo_eval.c +++ b/src/backend/optimizer/geqo/geqo_eval.c @@ -39,10 +39,20 @@ typedef struct int size; /* number of input relations in clump */ } Clump; +/* The original length and hashtable of a RelInfoList */ +typedef struct +{ + int savelength; + struct HTAB *savehash; +} RelInfoListInfo; + static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene, bool force); static bool desirable_join(PlannerInfo *root, RelOptInfo *outer_rel, RelOptInfo *inner_rel); +static RelInfoListInfo save_relinfolist(RelInfoList *relinfo_list); +static void restore_relinfolist(RelInfoList *relinfo_list, + RelInfoListInfo *info); /* @@ -60,8 +70,9 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) MemoryContext oldcxt; RelOptInfo *joinrel; Cost fitness; - int savelength; - struct HTAB *savehash; + RelInfoListInfo save_join_rel; + RelInfoListInfo save_grouped_rel; + RelInfoListInfo save_grouped_info; /* * Create a private memory context that will hold all temp storage @@ -78,25 +89,33 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) oldcxt = MemoryContextSwitchTo(mycontext); /* - * gimme_tree will add entries to root->join_rel_list, which may or may - * not already contain some entries. The newly added entries will be - * recycled by the MemoryContextDelete below, so we must ensure that the - * list is restored to its former state before exiting. We can do this by - * truncating the list to its original length. NOTE this assumes that any - * added entries are appended at the end! + * gimme_tree will add entries to root->join_rel_list, root->agg_info_list + * and root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], which may or may not + * already contain some entries. The newly added entries will be recycled + * by the MemoryContextDelete below, so we must ensure that each list of + * the RelInfoList structures is restored to its former state before + * exiting. We can do this by truncating each list to its original + * length. NOTE this assumes that any added entries are appended at the + * end! * - * We also must take care not to mess up the outer join_rel_list->hash, if - * there is one. We can do this by just temporarily setting the link to - * NULL. (If we are dealing with enough join rels, which we very likely - * are, a new hash table will get built and used locally.) + * We also must take care not to mess up the outer hash tables of the + * RelInfoList structures, if any. We can do this by just temporarily + * setting each link to NULL. (If we are dealing with enough join rels, + * which we very likely are, new hash tables will get built and used + * locally.) * * join_rel_level[] shouldn't be in use, so just Assert it isn't. */ - savelength = list_length(root->join_rel_list->items); - savehash = root->join_rel_list->hash; + save_join_rel = save_relinfolist(root->join_rel_list); + save_grouped_rel = + save_relinfolist(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG]); + save_grouped_info = save_relinfolist(root->agg_info_list); + Assert(root->join_rel_level == NULL); root->join_rel_list->hash = NULL; + root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG].hash = NULL; + root->agg_info_list->hash = NULL; /* construct the best path for the given combination of relations */ joinrel = gimme_tree(root, tour, num_gene); @@ -118,12 +137,14 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) fitness = DBL_MAX; /* - * Restore join_rel_list to its former state, and put back original - * hashtable if any. + * Restore each of the list in join_rel_list, agg_info_list and + * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] to its former state, and put + * back original hashtable if any. */ - root->join_rel_list->items = list_truncate(root->join_rel_list->items, - savelength); - root->join_rel_list->hash = savehash; + restore_relinfolist(root->join_rel_list, &save_join_rel); + restore_relinfolist(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], + &save_grouped_rel); + restore_relinfolist(root->agg_info_list, &save_grouped_info); /* release all the memory acquired within gimme_tree */ MemoryContextSwitchTo(oldcxt); @@ -279,6 +300,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene, /* Find and save the cheapest paths for this joinrel */ set_cheapest(joinrel); + /* + * Except for the topmost scan/join rel, consider generating + * partial aggregation paths for the grouped relation on top + * of the paths of this rel. After that, we're done creating + * paths for the grouped relation, so run set_cheapest(). + */ + if (!bms_equal(joinrel->relids, root->all_query_rels)) + { + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + rel_grouped = find_grouped_rel(root, joinrel->relids, + &agg_info); + if (rel_grouped) + { + generate_grouped_paths(root, rel_grouped, joinrel, + agg_info); + set_cheapest(rel_grouped); + } + } + /* Absorb new clump into old */ old_clump->joinrel = joinrel; old_clump->size += new_clump->size; @@ -336,3 +378,27 @@ desirable_join(PlannerInfo *root, /* Otherwise postpone the join till later. */ return false; } + +/* + * Save the original length and hashtable of a RelInfoList. + */ +static RelInfoListInfo +save_relinfolist(RelInfoList *relinfo_list) +{ + RelInfoListInfo info; + + info.savelength = list_length(relinfo_list->items); + info.savehash = relinfo_list->hash; + + return info; +} + +/* + * Restore the original length and hashtable of a RelInfoList. + */ +static void +restore_relinfolist(RelInfoList *relinfo_list, RelInfoListInfo *info) +{ + relinfo_list->items = list_truncate(relinfo_list->items, info->savelength); + relinfo_list->hash = info->savehash; +} diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index b550e707a4..03795a0ec4 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -40,6 +40,7 @@ #include "optimizer/paths.h" #include "optimizer/plancat.h" #include "optimizer/planner.h" +#include "optimizer/prep.h" #include "optimizer/tlist.h" #include "parser/parse_clause.h" #include "parser/parsetree.h" @@ -47,6 +48,7 @@ #include "port/pg_bitutils.h" #include "rewrite/rewriteManip.h" #include "utils/lsyscache.h" +#include "utils/selfuncs.h" /* Bitmask flags for pushdown_safety_info.unsafeFlags */ @@ -77,6 +79,7 @@ typedef enum pushdown_safe_type /* These parameters are set by GUC */ bool enable_geqo = false; /* just in case GUC doesn't set it */ +bool enable_eager_aggregate = false; int geqo_threshold; int min_parallel_table_scan_size; int min_parallel_index_scan_size; @@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL; static void set_base_rel_consider_startup(PlannerInfo *root); static void set_base_rel_sizes(PlannerInfo *root); +static void setup_base_grouped_rels(PlannerInfo *root); static void set_base_rel_pathlists(PlannerInfo *root); static void set_rel_size(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); @@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); +static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel); static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel, List *live_childrels, List *all_child_pathkeys); @@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist) */ set_base_rel_sizes(root); + /* + * Build grouped base relations for each base rel if possible. + */ + setup_base_grouped_rels(root); + /* * We should now have size estimates for every actual table involved in * the query, and we also know which if any have been deleted from the @@ -323,6 +333,53 @@ set_base_rel_sizes(PlannerInfo *root) } } +/* + * setup_base_grouped_rels + * For each "plain" base relation, build a grouped base relation if eager + * aggregation is possible and if this relation can produce grouped paths. + */ +static void +setup_base_grouped_rels(PlannerInfo *root) +{ + Index rti; + + /* + * If there are no aggregate expressions or grouping expressions, eager + * aggregation is not possible. + */ + if (root->agg_clause_list == NIL || + root->group_expr_list == NIL) + return; + + /* + * Eager aggregation only makes sense if there are multiple base rels in + * the query. + */ + if (bms_membership(root->all_baserels) != BMS_MULTIPLE) + return; + + for (rti = 1; rti < root->simple_rel_array_size; rti++) + { + RelOptInfo *rel = root->simple_rel_array[rti]; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + /* there may be empty slots corresponding to non-baserel RTEs */ + if (rel == NULL) + continue; + + Assert(rel->relid == rti); /* sanity check on array */ + Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */ + + rel_grouped = build_simple_grouped_rel(root, rel->relid, &agg_info); + if (rel_grouped) + { + /* Make the grouped relation available for joining. */ + add_grouped_rel(root, rel_grouped, agg_info); + } + } +} + /* * set_base_rel_pathlists * Finds all paths available for scanning each base-relation entry. @@ -559,6 +616,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, /* Now find the cheapest of the paths for this rel */ set_cheapest(rel); + /* + * If a grouped relation for this rel exists, build partial aggregation + * paths for it. + * + * Note that this can only happen after we've called set_cheapest() for + * this base rel, because we need its cheapest paths. + */ + set_grouped_rel_pathlist(root, rel); + #ifdef OPTIMIZER_DEBUG pprint(rel); #endif @@ -1294,6 +1360,28 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, add_paths_to_append_rel(root, rel, live_childrels); } +/* + * set_grouped_rel_pathlist + * If a grouped relation for the given 'rel' exists, build partial + * aggregation paths for it. + */ +static void +set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel) +{ + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + /* Add paths to the grouped base relation if one exists. */ + rel_grouped = find_grouped_rel(root, rel->relids, + &agg_info); + if (rel_grouped) + { + generate_grouped_paths(root, rel_grouped, rel, + agg_info); + set_cheapest(rel_grouped); + } +} + /* * add_paths_to_append_rel @@ -3302,6 +3390,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r } } +/* + * generate_grouped_paths + * Generate paths for a grouped relation by adding sorted and hashed + * partial aggregation paths on top of paths of the plain base or join + * relation. + * + * The information needed are provided by the RelAggInfo structure. + */ +void +generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, + RelOptInfo *rel_plain, RelAggInfo *agg_info) +{ + AggClauseCosts agg_costs; + bool can_hash; + bool can_sort; + Path *cheapest_total_path = NULL; + Path *cheapest_partial_path = NULL; + double dNumGroups = 0; + double dNumPartialGroups = 0; + + if (IS_DUMMY_REL(rel_plain)) + { + mark_dummy_rel(rel_grouped); + return; + } + + MemSet(&agg_costs, 0, sizeof(AggClauseCosts)); + get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs); + + /* + * Determine whether it's possible to perform sort-based implementations + * of grouping. + */ + can_sort = grouping_is_sortable(agg_info->group_clauses); + + /* + * Determine whether we should consider hash-based implementations of + * grouping. + */ + Assert(root->numOrderedAggs == 0); + can_hash = (agg_info->group_clauses != NIL && + grouping_is_hashable(agg_info->group_clauses)); + + /* + * Consider whether we should generate partially aggregated non-partial + * paths. We can only do this if we have a non-partial path. + */ + if (rel_plain->pathlist != NIL) + { + cheapest_total_path = rel_plain->cheapest_total_path; + Assert(cheapest_total_path != NULL); + } + + /* + * If parallelism is possible for rel_grouped, then we should consider + * generating partially-grouped partial paths. However, if the plain rel + * has no partial paths, then we can't. + */ + if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL) + { + cheapest_partial_path = linitial(rel_plain->partial_pathlist); + Assert(cheapest_partial_path != NULL); + } + + /* Estimate number of partial groups. */ + if (cheapest_total_path != NULL) + dNumGroups = estimate_num_groups(root, + agg_info->group_exprs, + cheapest_total_path->rows, + NULL, NULL); + if (cheapest_partial_path != NULL) + dNumPartialGroups = estimate_num_groups(root, + agg_info->group_exprs, + cheapest_partial_path->rows, + NULL, NULL); + + if (can_sort && cheapest_total_path != NULL) + { + ListCell *lc; + + /* + * Use any available suitably-sorted path as input, and also consider + * sorting the cheapest-total path. + */ + foreach(lc, rel_plain->pathlist) + { + Path *input_path = (Path *) lfirst(lc); + Path *path; + bool is_sorted; + int presorted_keys; + + /* + * Since the path originates from a non-grouped relation that is + * not aware of eager aggregation, we must ensure that it provides + * the correct input for partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + input_path, + agg_info->agg_input); + + is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys, + path->pathkeys, + &presorted_keys); + if (!is_sorted) + { + /* + * Try at least sorting the cheapest path and also try + * incrementally sorting any path which is partially sorted + * already (no need to deal with paths which have presorted + * keys when incremental sort is disabled unless it's the + * cheapest input path). + */ + if (input_path != cheapest_total_path && + (presorted_keys == 0 || !enable_incremental_sort)) + continue; + + /* + * We've no need to consider both a sort and incremental sort. + * We'll just do a sort if there are no presorted keys and an + * incremental sort when there are presorted keys. + */ + if (presorted_keys == 0 || !enable_incremental_sort) + path = (Path *) create_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + -1.0); + else + path = (Path *) create_incremental_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + presorted_keys, + -1.0); + } + + /* + * qual is NIL because the HAVING clause cannot be evaluated until + * the final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_SORTED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumGroups); + + add_path(rel_grouped, path); + } + } + + if (can_sort && cheapest_partial_path != NULL) + { + ListCell *lc; + + /* Similar to above logic, but for partial paths. */ + foreach(lc, rel_plain->partial_pathlist) + { + Path *input_path = (Path *) lfirst(lc); + Path *path; + bool is_sorted; + int presorted_keys; + + /* + * Since the path originates from a non-grouped relation that is + * not aware of eager aggregation, we must ensure that it provides + * the correct input for partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + input_path, + agg_info->agg_input); + + is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys, + path->pathkeys, + &presorted_keys); + + if (!is_sorted) + { + /* + * Try at least sorting the cheapest path and also try + * incrementally sorting any path which is partially sorted + * already (no need to deal with paths which have presorted + * keys when incremental sort is disabled unless it's the + * cheapest input path). + */ + if (input_path != cheapest_partial_path && + (presorted_keys == 0 || !enable_incremental_sort)) + continue; + + /* + * We've no need to consider both a sort and incremental sort. + * We'll just do a sort if there are no presorted keys and an + * incremental sort when there are presorted keys. + */ + if (presorted_keys == 0 || !enable_incremental_sort) + path = (Path *) create_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + -1.0); + else + path = (Path *) create_incremental_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + presorted_keys, + -1.0); + } + + /* + * qual is NIL because the HAVING clause cannot be evaluated until + * the final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_SORTED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumPartialGroups); + + add_partial_path(rel_grouped, path); + } + } + + /* + * Add a partially-grouped HashAgg Path where possible + */ + if (can_hash && cheapest_total_path != NULL) + { + Path *path; + + /* + * Since the path originates from a non-grouped relation that is not + * aware of eager aggregation, we must ensure that it provides the + * correct input for partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + cheapest_total_path, + agg_info->agg_input); + + /* + * qual is NIL because the HAVING clause cannot be evaluated until the + * final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_HASHED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumGroups); + + add_path(rel_grouped, path); + } + + /* + * Now add a partially-grouped HashAgg partial Path where possible + */ + if (can_hash && cheapest_partial_path != NULL) + { + Path *path; + + /* + * Since the path originates from a non-grouped relation that is not + * aware of eager aggregation, we must ensure that it provides the + * correct input for partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + cheapest_partial_path, + agg_info->agg_input); + + /* + * qual is NIL because the HAVING clause cannot be evaluated until the + * final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_HASHED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumPartialGroups); + + add_partial_path(rel_grouped, path); + } +} + /* * make_rel_from_joinlist * Build access paths using a "joinlist" to guide the join path search. @@ -3462,6 +3855,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) * * After that, we're done creating paths for the joinrel, so run * set_cheapest(). + * + * In addition, we also run generate_grouped_paths() for the grouped + * relation of each just-processed joinrel, and run set_cheapest() for + * the grouped relation afterwards. */ foreach(lc, root->join_rel_level[lev]) { @@ -3482,6 +3879,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) /* Find and save the cheapest paths for this rel */ set_cheapest(rel); + /* + * Except for the topmost scan/join rel, consider generating + * partial aggregation paths for the grouped relation on top of + * the paths of this rel. After that, we're done creating paths + * for the grouped relation, so run set_cheapest(). + */ + if (!bms_equal(rel->relids, root->all_query_rels)) + { + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + rel_grouped = find_grouped_rel(root, rel->relids, + &agg_info); + if (rel_grouped) + { + generate_grouped_paths(root, rel_grouped, rel, + agg_info); + set_cheapest(rel_grouped); + } + } + #ifdef OPTIMIZER_DEBUG pprint(rel); #endif @@ -4350,6 +4768,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel) if (IS_DUMMY_REL(child_rel)) continue; + /* + * Except for the topmost scan/join rel, consider generating partial + * aggregation paths for the grouped relation on top of the paths of + * this partitioned child-join. After that, we're done creating paths + * for the grouped relation, so run set_cheapest(). + */ + if (!bms_equal(IS_OTHER_REL(rel) ? + rel->top_parent_relids : rel->relids, + root->all_query_rels)) + { + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + rel_grouped = find_grouped_rel(root, child_rel->relids, + &agg_info); + if (rel_grouped) + { + generate_grouped_paths(root, rel_grouped, child_rel, + agg_info); + set_cheapest(rel_grouped); + } + } + #ifdef OPTIMIZER_DEBUG pprint(child_rel); #endif diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index 7db5e30eef..e1a2d3b414 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -16,11 +16,13 @@ #include "miscadmin.h" #include "optimizer/appendinfo.h" +#include "optimizer/cost.h" #include "optimizer/joininfo.h" #include "optimizer/pathnode.h" #include "optimizer/paths.h" #include "partitioning/partbounds.h" #include "utils/memutils.h" +#include "utils/selfuncs.h" static void make_rels_by_clause_joins(PlannerInfo *root, @@ -35,6 +37,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel); static bool restriction_is_constant_false(List *restrictlist, RelOptInfo *joinrel, bool only_pushed_down); +static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1, + RelOptInfo *rel2, RelOptInfo *joinrel, + SpecialJoinInfo *sjinfo, List *restrictlist); static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, RelOptInfo *joinrel, SpecialJoinInfo *sjinfo, List *restrictlist); @@ -771,6 +776,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2) return joinrel; } + /* Build a grouped join relation for 'joinrel' if possible. */ + make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo, + restrictlist); + /* Add paths to the join relation. */ populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo, restrictlist); @@ -882,6 +891,127 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids, return input_relids; } +/* + * make_grouped_join_rel + * Build a grouped join relation out of 'joinrel' if eager aggregation is + * possible and the 'joinrel' can produce grouped paths. + * + * We also generate partial aggregation paths for the grouped relation by + * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by + * joining the grouped paths of 'rel2' to the plain paths of 'rel1'. + */ +static void +make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1, + RelOptInfo *rel2, RelOptInfo *joinrel, + SpecialJoinInfo *sjinfo, List *restrictlist) +{ + RelOptInfo *rel_grouped; + RelAggInfo *agg_info = NULL; + RelOptInfo *rel1_grouped; + RelOptInfo *rel2_grouped; + bool rel1_empty; + bool rel2_empty; + + /* + * If there are no aggregate expressions or grouping expressions, eager + * aggregation is not possible. + */ + if (root->agg_clause_list == NIL || + root->group_expr_list == NIL) + return; + + /* + * See if we already have a grouped joinrel for this joinrel. + */ + rel_grouped = find_grouped_rel(root, joinrel->relids, &agg_info); + + /* + * Construct a new RelOptInfo for the grouped join relation if there is no + * existing one. + */ + if (rel_grouped == NULL) + { + /* + * Prepare the information needed to create grouped paths for this + * join relation. + */ + agg_info = create_rel_agg_info(root, joinrel); + if (agg_info == NULL) + return; + + /* build a grouped relation out of the plain relation */ + rel_grouped = build_grouped_rel(root, joinrel); + rel_grouped->reltarget = agg_info->target; + rel_grouped->rows = agg_info->grouped_rows; + + /* + * Make the grouped relation available for further joining or for + * acting as the upper rel representing the result of partial + * aggregation. + */ + add_grouped_rel(root, rel_grouped, agg_info); + } + + Assert(agg_info != NULL); + + /* We may have already proven this grouped join relation to be dummy. */ + if (IS_DUMMY_REL(rel_grouped)) + return; + + /* Retrieve the grouped relations for the two input rels */ + rel1_grouped = find_grouped_rel(root, rel1->relids, NULL); + rel2_grouped = find_grouped_rel(root, rel2->relids, NULL); + + rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped)); + rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped)); + + /* Nothing to do if there's no grouped relation. */ + if (rel1_empty && rel2_empty) + return; + + /* + * Joining two grouped relations is currently not supported. Grouping one + * side would alter the occurrence of the other side's aggregate transient + * states in the final aggregation input. While this issue could be + * addressed by adjusting the transient states, it is not deemed + * worthwhile for now. + */ + if (!rel1_empty && !rel2_empty) + return; + + /* Generate partial aggregation paths for the grouped relation */ + if (!rel1_empty) + { + set_joinrel_size_estimates(root, rel_grouped, rel1_grouped, rel2, + sjinfo, restrictlist); + populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped, + sjinfo, restrictlist); + + /* + * It shouldn't happen that we have marked rel1_grouped as dummy in + * populate_joinrel_with_paths due to provably constant-false join + * restrictions, hence we wouldn't end up with a plan that has Aggref + * in non-Agg plan node. + */ + Assert(!IS_DUMMY_REL(rel1_grouped)); + } + else if (!rel2_empty) + { + set_joinrel_size_estimates(root, rel_grouped, rel1, rel2_grouped, + sjinfo, restrictlist); + populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped, + sjinfo, restrictlist); + + /* + * It shouldn't happen that we have marked rel2_grouped as dummy in + * populate_joinrel_with_paths due to provably constant-false join + * restrictions, hence we wouldn't end up with a plan that has Aggref + * in non-Agg plan node. + */ + Assert(!IS_DUMMY_REL(rel2_grouped)); + } +} + /* * populate_joinrel_with_paths * Add paths to the given joinrel for given pair of joining relations. The @@ -1674,6 +1804,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, adjust_child_relids(joinrel->relids, nappinfos, appinfos))); + /* Build a grouped join relation for 'child_joinrel' if possible */ + make_grouped_join_rel(root, child_rel1, child_rel2, + child_joinrel, child_sjinfo, + child_restrictlist); + /* And make paths for the child join */ populate_joinrel_with_paths(root, child_rel1, child_rel2, child_joinrel, child_sjinfo, diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index e2c68fe6f9..2ca035dd80 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -14,6 +14,7 @@ */ #include "postgres.h" +#include "access/nbtree.h" #include "catalog/pg_type.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" @@ -80,6 +81,8 @@ typedef struct JoinTreeItem } JoinTreeItem; +static void create_agg_clause_infos(PlannerInfo *root); +static void create_grouping_expr_infos(PlannerInfo *root); static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel, Index rtindex); static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode, @@ -327,6 +330,255 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars, } } +/* + * setup_eager_aggregation + * Check if eager aggregation is applicable, and if so collect suitable + * aggregate expressions and grouping expressions in the query. + */ +void +setup_eager_aggregation(PlannerInfo *root) +{ + /* + * Don't apply eager aggregation if disabled by user. + */ + if (!enable_eager_aggregate) + return; + + /* + * Don't apply eager aggregation if there are no available GROUP BY + * clauses. + */ + if (!root->processed_groupClause) + return; + + /* + * For now we don't try to support grouping sets. + */ + if (root->parse->groupingSets) + return; + + /* + * For now we don't try to support DISTINCT or ORDER BY aggregates. + */ + if (root->numOrderedAggs > 0) + return; + + /* + * If there are any aggregates that do not support partial mode, or any + * partial aggregates that are non-serializable, do not apply eager + * aggregation. + */ + if (root->hasNonPartialAggs || root->hasNonSerialAggs) + return; + + /* + * We don't try to apply eager aggregation if there are set-returning + * functions in targetlist. + */ + if (root->parse->hasTargetSRFs) + return; + + /* + * Collect aggregate expressions and plain Vars that appear in targetlist + * and havingQual. + */ + create_agg_clause_infos(root); + + /* + * If there are no suitable aggregate expressions, we cannot apply eager + * aggregation. + */ + if (root->agg_clause_list == NIL) + return; + + /* + * Collect grouping expressions that appear in grouping clauses. + */ + create_grouping_expr_infos(root); +} + +/* + * create_agg_clause_infos + * Search the targetlist and havingQual for Aggrefs and plain Vars, and + * create an AggClauseInfo for each Aggref node. + */ +static void +create_agg_clause_infos(PlannerInfo *root) +{ + List *tlist_exprs; + ListCell *lc; + + Assert(root->agg_clause_list == NIL); + Assert(root->tlist_vars == NIL); + + tlist_exprs = pull_var_clause((Node *) root->processed_tlist, + PVC_INCLUDE_AGGREGATES | + PVC_RECURSE_WINDOWFUNCS | + PVC_RECURSE_PLACEHOLDERS); + + /* + * For now we don't try to support GROUPING() expressions. + */ + foreach(lc, tlist_exprs) + { + Expr *expr = (Expr *) lfirst(lc); + + if (IsA(expr, GroupingFunc)) + return; + } + + /* + * Aggregates within the HAVING clause need to be processed in the same + * way as those in the targetlist. Note that HAVING can contain Aggrefs + * but not WindowFuncs. + */ + if (root->parse->havingQual != NULL) + { + List *having_exprs; + + having_exprs = pull_var_clause((Node *) root->parse->havingQual, + PVC_INCLUDE_AGGREGATES | + PVC_RECURSE_PLACEHOLDERS); + if (having_exprs != NIL) + { + tlist_exprs = list_concat(tlist_exprs, having_exprs); + list_free(having_exprs); + } + } + + foreach(lc, tlist_exprs) + { + Expr *expr = (Expr *) lfirst(lc); + Aggref *aggref; + AggClauseInfo *ac_info; + + /* + * collect plain Vars for future reference + */ + if (IsA(expr, Var)) + { + root->tlist_vars = list_append_unique(root->tlist_vars, expr); + continue; + } + + aggref = castNode(Aggref, expr); + + Assert(aggref->aggorder == NIL); + Assert(aggref->aggdistinct == NIL); + + ac_info = makeNode(AggClauseInfo); + ac_info->aggref = aggref; + ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref); + + root->agg_clause_list = + list_append_unique(root->agg_clause_list, ac_info); + } + + list_free(tlist_exprs); +} + +/* + * create_grouping_expr_infos + * Create GroupExprInfo for each expression usable as grouping key. + * + * If any grouping expression is not suitable, we will just return with + * root->group_expr_list being NIL. + */ +static void +create_grouping_expr_infos(PlannerInfo *root) +{ + List *exprs = NIL; + List *sortgrouprefs = NIL; + List *btree_opfamilies = NIL; + ListCell *lc, + *lc1, + *lc2, + *lc3; + + Assert(root->group_expr_list == NIL); + + foreach(lc, root->processed_groupClause) + { + SortGroupClause *sgc = lfirst_node(SortGroupClause, lc); + TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist); + TypeCacheEntry *tce; + Oid equalimageproc; + Oid eq_op; + List *eq_opfamilies; + Oid btree_opfamily; + + Assert(tle->ressortgroupref > 0); + + /* + * For now we only support plain Vars as grouping expressions. + */ + if (!IsA(tle->expr, Var)) + return; + + /* + * Eager aggregation is only possible if equality of grouping keys, as + * defined by the equality operator, implies bitwise equality. + * Otherwise, if we put keys with different byte images into the same + * group, we may lose some information that could be needed to + * evaluate upper qual clauses. + * + * For example, the NUMERIC data type is not supported because values + * that fall into the same group according to the equality operator + * (e.g. 0 and 0.0) can have different scale. + */ + tce = lookup_type_cache(exprType((Node *) tle->expr), + TYPECACHE_BTREE_OPFAMILY); + if (!OidIsValid(tce->btree_opf) || + !OidIsValid(tce->btree_opintype)) + return; + + equalimageproc = get_opfamily_proc(tce->btree_opf, + tce->btree_opintype, + tce->btree_opintype, + BTEQUALIMAGE_PROC); + if (!OidIsValid(equalimageproc) || + !DatumGetBool(OidFunctionCall1Coll(equalimageproc, + tce->typcollation, + ObjectIdGetDatum(tce->btree_opintype)))) + return; + + /* + * Get the operator in the btree's opfamily. + */ + eq_op = get_opfamily_member(tce->btree_opf, + tce->btree_opintype, + tce->btree_opintype, + BTEqualStrategyNumber); + if (!OidIsValid(eq_op)) + return; + eq_opfamilies = get_mergejoin_opfamilies(eq_op); + if (!eq_opfamilies) + return; + btree_opfamily = linitial_oid(eq_opfamilies); + + exprs = lappend(exprs, tle->expr); + sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref); + btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily); + } + + /* + * Construct GroupExprInfo for each expression. + */ + forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies) + { + Expr *expr = (Expr *) lfirst(lc1); + int sortgroupref = lfirst_int(lc2); + Oid btree_opfamily = lfirst_oid(lc3); + GroupExprInfo *ge_info; + + ge_info = makeNode(GroupExprInfo); + ge_info->expr = (Expr *) copyObject(expr); + ge_info->sortgroupref = sortgroupref; + ge_info->btree_opfamily = btree_opfamily; + + root->group_expr_list = lappend(root->group_expr_list, ge_info); + } +} /***************************************************************************** * diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index fd8b2b0ca3..ece6936e23 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -67,6 +67,9 @@ query_planner(PlannerInfo *root, root->join_rel_list = makeNode(RelInfoList); root->join_rel_list->items = NIL; root->join_rel_list->hash = NULL; + root->agg_info_list = makeNode(RelInfoList); + root->agg_info_list->items = NIL; + root->agg_info_list->hash = NULL; root->join_rel_level = NULL; root->join_cur_level = 0; root->canon_pathkeys = NIL; @@ -77,6 +80,9 @@ query_planner(PlannerInfo *root, root->placeholder_list = NIL; root->placeholder_array = NULL; root->placeholder_array_size = 0; + root->agg_clause_list = NIL; + root->group_expr_list = NIL; + root->tlist_vars = NIL; root->fkey_list = NIL; root->initial_rels = NIL; @@ -258,6 +264,12 @@ query_planner(PlannerInfo *root, */ extract_restriction_or_clauses(root); + /* + * Check if eager aggregation is applicable, and if so, set up + * root->agg_clause_list and root->group_expr_list. + */ + setup_eager_aggregation(root); + /* * Now expand appendrels by adding "otherrels" for their children. We * delay this to the end so that we have as much information as possible diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 948afd9094..89a8f39031 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -225,7 +225,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, RelOptInfo *partially_grouped_rel, const AggClauseCosts *agg_costs, grouping_sets_data *gd, - double dNumGroups, GroupPathExtraData *extra); static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root, RelOptInfo *grouped_rel, @@ -3999,9 +3998,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, GroupPathExtraData *extra, RelOptInfo **partially_grouped_rel_p) { - Path *cheapest_path = input_rel->cheapest_total_path; RelOptInfo *partially_grouped_rel = NULL; - double dNumGroups; PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE; /* @@ -4082,23 +4079,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, /* Gather any partially grouped partial paths. */ if (partially_grouped_rel && partially_grouped_rel->partial_pathlist) - { gather_grouping_paths(root, partially_grouped_rel); - set_cheapest(partially_grouped_rel); - } - /* - * Estimate number of groups. - */ - dNumGroups = get_number_of_groups(root, - cheapest_path->rows, - gd, - extra->targetList); + /* Now choose the best path(s) for partially_grouped_rel. */ + if (partially_grouped_rel && partially_grouped_rel->pathlist) + set_cheapest(partially_grouped_rel); /* Build final grouping paths */ add_paths_to_grouping_rel(root, input_rel, grouped_rel, partially_grouped_rel, agg_costs, gd, - dNumGroups, extra); + extra); /* Give a helpful error if we failed to find any implementation */ if (grouped_rel->pathlist == NIL) @@ -6966,16 +6956,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, RelOptInfo *grouped_rel, RelOptInfo *partially_grouped_rel, const AggClauseCosts *agg_costs, - grouping_sets_data *gd, double dNumGroups, + grouping_sets_data *gd, GroupPathExtraData *extra) { Query *parse = root->parse; Path *cheapest_path = input_rel->cheapest_total_path; + Path *cheapest_partially_grouped_path = NULL; ListCell *lc; bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0; bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0; List *havingQual = (List *) extra->havingQual; AggClauseCosts *agg_final_costs = &extra->agg_final_costs; + double dNumGroups = 0; + double dNumFinalGroups = 0; + + /* + * Estimate number of groups for non-split aggregation. + */ + dNumGroups = get_number_of_groups(root, + cheapest_path->rows, + gd, + extra->targetList); + + if (partially_grouped_rel && partially_grouped_rel->pathlist) + { + cheapest_partially_grouped_path = + partially_grouped_rel->cheapest_total_path; + + /* + * Estimate number of groups for final phase of partial aggregation. + */ + dNumFinalGroups = + get_number_of_groups(root, + cheapest_partially_grouped_path->rows, + gd, + extra->targetList); + } if (can_sort) { @@ -7087,7 +7103,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, path = make_ordered_path(root, grouped_rel, path, - partially_grouped_rel->cheapest_total_path, + cheapest_partially_grouped_path, info->pathkeys); if (path == NULL) @@ -7104,7 +7120,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, info->clauses, havingQual, agg_final_costs, - dNumGroups)); + dNumFinalGroups)); else add_path(grouped_rel, (Path *) create_group_path(root, @@ -7112,7 +7128,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, path, info->clauses, havingQual, - dNumGroups)); + dNumFinalGroups)); } } @@ -7154,19 +7170,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, */ if (partially_grouped_rel && partially_grouped_rel->pathlist) { - Path *path = partially_grouped_rel->cheapest_total_path; - add_path(grouped_rel, (Path *) create_agg_path(root, grouped_rel, - path, + cheapest_partially_grouped_path, grouped_rel->reltarget, AGG_HASHED, AGGSPLIT_FINAL_DESERIAL, root->processed_groupClause, havingQual, agg_final_costs, - dNumGroups)); + dNumFinalGroups)); } } @@ -7216,6 +7230,21 @@ create_partial_grouping_paths(PlannerInfo *root, bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0; bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0; + /* + * The partially_grouped_rel could have been already created due to eager + * aggregation. + */ + partially_grouped_rel = find_grouped_rel(root, input_rel->relids, NULL); + Assert(enable_eager_aggregate || partially_grouped_rel == NULL); + + /* + * It is possible that the partially_grouped_rel created by eager + * aggregation is dummy. In this case we just set it to NULL. It might + * be created again by the following logic if possible. + */ + if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel)) + partially_grouped_rel = NULL; + /* * Consider whether we should generate partially aggregated non-partial * paths. We can only do this if we have a non-partial path, and only if @@ -7239,19 +7268,27 @@ create_partial_grouping_paths(PlannerInfo *root, * If we can't partially aggregate partial paths, and we can't partially * aggregate non-partial paths, then don't bother creating the new * RelOptInfo at all, unless the caller specified force_rel_creation. + * + * Note that the partially_grouped_rel could have been already created and + * populated with appropriate paths by eager aggregation. */ if (cheapest_total_path == NULL && cheapest_partial_path == NULL && + (partially_grouped_rel == NULL || + partially_grouped_rel->pathlist == NIL) && !force_rel_creation) return NULL; /* * Build a new upper relation to represent the result of partially - * aggregating the rows from the input relation. - */ - partially_grouped_rel = fetch_upper_rel(root, - UPPERREL_PARTIAL_GROUP_AGG, - grouped_rel->relids); + * aggregating the rows from the input relation. The relation may already + * exist due to eager aggregation, in which case we don't need to create + * it. + */ + if (partially_grouped_rel == NULL) + partially_grouped_rel = fetch_upper_rel(root, + UPPERREL_PARTIAL_GROUP_AGG, + grouped_rel->relids); partially_grouped_rel->consider_parallel = grouped_rel->consider_parallel; partially_grouped_rel->reloptkind = grouped_rel->reloptkind; @@ -7260,6 +7297,14 @@ create_partial_grouping_paths(PlannerInfo *root, partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent; partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine; + /* + * Partially-grouped partial paths may have been generated by eager + * aggregation. If we find that parallelism is not possible for + * partially_grouped_rel, we need to drop these partial paths. + */ + if (!partially_grouped_rel->consider_parallel) + partially_grouped_rel->partial_pathlist = NIL; + /* * Build target list for partial aggregate paths. These paths cannot just * emit the same tlist as regular aggregate paths, because (1) we must diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index 4989722637..4884d9ddea 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -499,6 +499,66 @@ adjust_appendrel_attrs_mutator(Node *node, return (Node *) newinfo; } + /* + * We have to process RelAggInfo nodes specially. + */ + if (IsA(node, RelAggInfo)) + { + RelAggInfo *oldinfo = (RelAggInfo *) node; + RelAggInfo *newinfo = makeNode(RelAggInfo); + + /* Copy all flat-copiable fields */ + memcpy(newinfo, oldinfo, sizeof(RelAggInfo)); + + newinfo->relids = adjust_child_relids(oldinfo->relids, + context->nappinfos, + context->appinfos); + + newinfo->target = (PathTarget *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->target, + context); + + newinfo->agg_input = (PathTarget *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input, + context); + + newinfo->group_clauses = (List *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses, + context); + + newinfo->group_exprs = (List *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs, + context); + + return (Node *) newinfo; + } + + /* + * We have to process PathTarget nodes specially. + */ + if (IsA(node, PathTarget)) + { + PathTarget *oldtarget = (PathTarget *) node; + PathTarget *newtarget = makeNode(PathTarget); + + /* Copy all flat-copiable fields */ + memcpy(newtarget, oldtarget, sizeof(PathTarget)); + + if (oldtarget->sortgrouprefs) + { + Size nbytes = list_length(oldtarget->exprs) * sizeof(Index); + + newtarget->exprs = (List *) + adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs, + context); + + newtarget->sortgrouprefs = (Index *) palloc(nbytes); + memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes); + } + + return (Node *) newtarget; + } + /* * NOTE: we do not need to recurse into sublinks, because they should * already have been converted to subplans before we see them. diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 54e042a8a5..3cb450b376 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -2702,8 +2702,7 @@ create_projection_path(PlannerInfo *root, pathnode->path.pathtype = T_Result; pathnode->path.parent = rel; pathnode->path.pathtarget = target; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe && @@ -2955,8 +2954,7 @@ create_incremental_sort_path(PlannerInfo *root, pathnode->path.parent = rel; /* Sort doesn't project, so use source path's pathtarget */ pathnode->path.pathtarget = subpath->pathtarget; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe; @@ -3002,8 +3000,7 @@ create_sort_path(PlannerInfo *root, pathnode->path.parent = rel; /* Sort doesn't project, so use source path's pathtarget */ pathnode->path.pathtarget = subpath->pathtarget; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe; @@ -3161,8 +3158,7 @@ create_agg_path(PlannerInfo *root, pathnode->path.pathtype = T_Agg; pathnode->path.parent = rel; pathnode->path.pathtarget = target; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe; diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index 76e13971f7..29806a3965 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -16,6 +16,7 @@ #include <limits.h> +#include "catalog/pg_constraint.h" #include "miscadmin.h" #include "nodes/nodeFuncs.h" #include "optimizer/appendinfo.h" @@ -27,22 +28,25 @@ #include "optimizer/paths.h" #include "optimizer/placeholder.h" #include "optimizer/plancat.h" +#include "optimizer/planner.h" #include "optimizer/restrictinfo.h" #include "optimizer/tlist.h" +#include "parser/parse_oper.h" #include "parser/parse_relation.h" #include "rewrite/rewriteManip.h" #include "utils/hsearch.h" #include "utils/lsyscache.h" +#include "utils/selfuncs.h" /* - * An entry of a hash table that we use to make lookup for RelOptInfo - * structures more efficient. + * An entry of a hash table that we use to make lookup for RelOptInfo or + * RelAggInfo structures more efficient. */ typedef struct RelInfoEntry { Relids relids; /* hash key --- MUST BE FIRST */ - RelOptInfo *rel; + void *data; } RelInfoEntry; static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, @@ -87,6 +91,14 @@ static void build_child_join_reltarget(PlannerInfo *root, RelOptInfo *childrel, int nappinfos, AppendRelInfo **appinfos); +static bool eager_aggregation_possible_for_relation(PlannerInfo *root, + RelOptInfo *rel); +static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel, + PathTarget *target, PathTarget *agg_input, + List **group_clauses, List **group_exprs); +static bool is_var_in_aggref_only(PlannerInfo *root, Var *var); +static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel); +static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr); /* @@ -410,6 +422,101 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) return rel; } +/* + * build_simple_grouped_rel + * Construct a new RelOptInfo for a grouped base relation out of an existing + * non-grouped base relation. + * + * On success, the new RelOptInfo is returned and the corresponding RelAggInfo + * is stored in *agg_info_p. + */ +RelOptInfo * +build_simple_grouped_rel(PlannerInfo *root, int relid, + RelAggInfo **agg_info_p) +{ + RelOptInfo *rel_plain; + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + /* + * We should have available aggregate expressions and grouping + * expressions, otherwise we cannot reach here. + */ + Assert(root->agg_clause_list != NIL); + Assert(root->group_expr_list != NIL); + + rel_plain = find_base_rel(root, relid); + + /* nothing to do for dummy rel */ + if (IS_DUMMY_REL(rel_plain)) + return NULL; + + /* + * Prepare the information needed to create grouped paths for this base + * relation. + */ + agg_info = create_rel_agg_info(root, rel_plain); + if (agg_info == NULL) + return NULL; + + /* build a grouped relation out of the plain relation */ + rel_grouped = build_grouped_rel(root, rel_plain); + rel_grouped->reltarget = agg_info->target; + rel_grouped->rows = agg_info->grouped_rows; + + /* return the RelAggInfo structure */ + *agg_info_p = agg_info; + + return rel_grouped; +} + +/* + * build_grouped_rel + * Build a grouped relation by flat copying a plain relation and resetting + * the necessary fields. + */ +RelOptInfo * +build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain) +{ + RelOptInfo *rel_grouped; + + rel_grouped = makeNode(RelOptInfo); + memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo)); + + /* + * clear path info + */ + rel_grouped->pathlist = NIL; + rel_grouped->ppilist = NIL; + rel_grouped->partial_pathlist = NIL; + rel_grouped->cheapest_startup_path = NULL; + rel_grouped->cheapest_total_path = NULL; + rel_grouped->cheapest_unique_path = NULL; + rel_grouped->cheapest_parameterized_paths = NIL; + + /* + * clear partition info + */ + rel_grouped->part_scheme = NULL; + rel_grouped->nparts = -1; + rel_grouped->boundinfo = NULL; + rel_grouped->partbounds_merged = false; + rel_grouped->partition_qual = NIL; + rel_grouped->part_rels = NULL; + rel_grouped->live_parts = NULL; + rel_grouped->all_partrels = NULL; + rel_grouped->partexprs = NULL; + rel_grouped->nullable_partexprs = NULL; + rel_grouped->consider_partitionwise_join = false; + + /* + * clear size estimates + */ + rel_grouped->rows = 0; + + return rel_grouped; +} + /* * find_base_rel * Find a base or otherrel relation entry, which must already exist. @@ -484,7 +591,7 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid) /* * build_rel_hash - * Construct the auxiliary hash table for relations. + * Construct the auxiliary hash table for relation-specific entries. */ static void build_rel_hash(RelInfoList *list) @@ -504,19 +611,27 @@ build_rel_hash(RelInfoList *list) &hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); - /* Insert all the already-existing relations */ + /* Insert all the already-existing relation-specific entries */ foreach(l, list->items) { - RelOptInfo *rel = (RelOptInfo *) lfirst(l); + void *item = lfirst(l); RelInfoEntry *hentry; bool found; + Relids relids; + + Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo)); + + if (IsA(item, RelOptInfo)) + relids = ((RelOptInfo *) item)->relids; + else + relids = ((RelAggInfo *) item)->relids; hentry = (RelInfoEntry *) hash_search(hashtab, - &(rel->relids), + &relids, HASH_ENTER, &found); Assert(!found); - hentry->rel = rel; + hentry->data = item; } list->hash = hashtab; @@ -524,9 +639,9 @@ build_rel_hash(RelInfoList *list) /* * find_rel_info - * Find an RelOptInfo entry. + * Find a RelOptInfo or a RelAggInfo entry. */ -static RelOptInfo * +static void * find_rel_info(RelInfoList *list, Relids relids) { if (list == NULL) @@ -557,7 +672,7 @@ find_rel_info(RelInfoList *list, Relids relids) HASH_FIND, NULL); if (hentry) - return hentry->rel; + return hentry->data; } else { @@ -565,10 +680,18 @@ find_rel_info(RelInfoList *list, Relids relids) foreach(l, list->items) { - RelOptInfo *rel = (RelOptInfo *) lfirst(l); + void *item = lfirst(l); + Relids item_relids; + + Assert(IsA(item, RelOptInfo) || IsA(item, RelAggInfo)); - if (bms_equal(rel->relids, relids)) - return rel; + if (IsA(item, RelOptInfo)) + item_relids = ((RelOptInfo *) item)->relids; + else + item_relids = ((RelAggInfo *) item)->relids; + + if (bms_equal(item_relids, relids)) + return item; } } @@ -583,44 +706,46 @@ find_rel_info(RelInfoList *list, Relids relids) RelOptInfo * find_join_rel(PlannerInfo *root, Relids relids) { - return find_rel_info(root->join_rel_list, relids); + return (RelOptInfo *) find_rel_info(root->join_rel_list, relids); } /* - * add_rel_info - * Add given relation to the given list. Also add it to the auxiliary - * hashtable if there is one. + * find_grouped_rel + * Returns relation entry corresponding to 'relids' (a set of RT indexes), + * or NULL if none exists. This is for grouped relations. + * + * If agg_info_p is not NULL, then also the corresponding RelAggInfo (if one + * exists) will be returned in *agg_info_p. */ -static void -add_rel_info(RelInfoList *list, RelOptInfo *rel) +RelOptInfo * +find_grouped_rel(PlannerInfo *root, Relids relids, RelAggInfo **agg_info_p) { - /* GEQO requires us to append the new relation to the end of the list! */ - list->items = lappend(list->items, rel); + RelOptInfo *rel; - /* store it into the auxiliary hashtable if there is one. */ - if (list->hash) + rel = (RelOptInfo *) find_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], + relids); + if (rel == NULL) { - RelInfoEntry *hentry; - bool found; + if (agg_info_p) + *agg_info_p = NULL; - hentry = (RelInfoEntry *) hash_search(list->hash, - &(rel->relids), - HASH_ENTER, - &found); - Assert(!found); - hentry->rel = rel; + return NULL; } -} -/* - * add_join_rel - * Add given join relation to the list of join relations in the given - * PlannerInfo. - */ -static void -add_join_rel(PlannerInfo *root, RelOptInfo *joinrel) -{ - add_rel_info(root->join_rel_list, joinrel); + /* also return the corresponding RelAggInfo, if asked */ + if (agg_info_p) + { + RelAggInfo *agg_info; + + agg_info = (RelAggInfo *) find_rel_info(root->agg_info_list, relids); + + /* The relation exists, so the agg_info should be there too. */ + Assert(agg_info != NULL); + + *agg_info_p = agg_info; + } + + return rel; } /* @@ -672,6 +797,64 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel, } } +/* + * add_rel_info + * Add relation-specific entry to a list, and also add it to the auxiliary + * hashtable if there is one. + */ +static void +add_rel_info(RelInfoList *list, void *data) +{ + Assert(IsA(data, RelOptInfo) || IsA(data, RelAggInfo)); + + /* GEQO requires us to append the new relation to the end of the list! */ + list->items = lappend(list->items, data); + + /* store it into the auxiliary hashtable if there is one. */ + if (list->hash) + { + RelInfoEntry *hentry; + bool found; + Relids relids; + + if (IsA(data, RelOptInfo)) + relids = ((RelOptInfo *) data)->relids; + else + relids = ((RelAggInfo *) data)->relids; + + hentry = (RelInfoEntry *) hash_search(list->hash, + &relids, + HASH_ENTER, + &found); + Assert(!found); + hentry->data = data; + } +} + +/* + * add_join_rel + * Add given join relation to the list of join relations in the given + * PlannerInfo. + */ +static void +add_join_rel(PlannerInfo *root, RelOptInfo *joinrel) +{ + add_rel_info(root->join_rel_list, joinrel); +} + +/* + * add_grouped_rel + * Add given grouped relation to the list of grouped relations in the + * given PlannerInfo. Also add the corresponding RelAggInfo to + * root->agg_info_list. + */ +void +add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, RelAggInfo *agg_info) +{ + add_rel_info(&root->upper_rels[UPPERREL_PARTIAL_GROUP_AGG], rel); + add_rel_info(root->agg_info_list, agg_info); +} + /* * build_join_rel * Returns relation entry corresponding to the union of two given rels, @@ -1491,7 +1674,7 @@ fetch_upper_rel(PlannerInfo *root, UpperRelationKind kind, Relids relids) /* If we already made this upperrel for the query, return it */ if (list) { - upperrel = find_rel_info(list, relids); + upperrel = (RelOptInfo *) find_rel_info(list, relids); if (upperrel) return upperrel; } @@ -2528,3 +2711,471 @@ build_child_join_reltarget(PlannerInfo *root, childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple; childrel->reltarget->width = parentrel->reltarget->width; } + +/* + * create_rel_agg_info + * Create the RelAggInfo structure for the given relation if it can produce + * grouped paths. The given relation is the non-grouped one which has the + * reltarget already constructed. + */ +RelAggInfo * +create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel) +{ + ListCell *lc; + RelAggInfo *result; + PathTarget *agg_input; + PathTarget *target; + List *group_clauses = NIL; + List *group_exprs = NIL; + + /* + * The lists of aggregate expressions and grouping expressions should have + * been constructed. + */ + Assert(root->agg_clause_list != NIL); + Assert(root->group_expr_list != NIL); + + /* + * If this is a child rel, the grouped rel for its parent rel must have + * been created if it can. So we can just use parent's RelAggInfo if + * there is one, with appropriate variable substitutions. + */ + if (IS_OTHER_REL(rel)) + { + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + Assert(!bms_is_empty(rel->top_parent_relids)); + rel_grouped = find_grouped_rel(root, rel->top_parent_relids, &agg_info); + + if (rel_grouped == NULL) + return NULL; + + Assert(agg_info != NULL); + /* Must do multi-level transformation */ + agg_info = (RelAggInfo *) + adjust_appendrel_attrs_multilevel(root, + (Node *) agg_info, + rel, + rel->top_parent); + + agg_info->grouped_rows = + estimate_num_groups(root, agg_info->group_exprs, + rel->rows, NULL, NULL); + + return agg_info; + } + + /* Check if it's possible to produce grouped paths for this relation. */ + if (!eager_aggregation_possible_for_relation(root, rel)) + return NULL; + + /* + * Create targets for the grouped paths and for the input paths of the + * grouped paths. + */ + target = create_empty_pathtarget(); + agg_input = create_empty_pathtarget(); + + /* ... and initialize these targets */ + if (!init_grouping_targets(root, rel, target, agg_input, + &group_clauses, &group_exprs)) + return NULL; + + /* + * Eager aggregation is not applicable if there are no available grouping + * expressions. + */ + if (list_length(group_clauses) == 0) + return NULL; + + /* build the RelAggInfo result */ + result = makeNode(RelAggInfo); + + result->group_clauses = group_clauses; + result->group_exprs = group_exprs; + + /* Calculate pathkeys that represent this grouping requirements */ + result->group_pathkeys = + make_pathkeys_for_sortclauses(root, result->group_clauses, + make_tlist_from_pathtarget(target)); + + /* Add aggregates to the grouping target */ + foreach(lc, root->agg_clause_list) + { + AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); + Aggref *aggref; + + Assert(IsA(ac_info->aggref, Aggref)); + + aggref = (Aggref *) copyObject(ac_info->aggref); + mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL); + + add_column_to_pathtarget(target, (Expr *) aggref, 0); + } + + /* Set the estimated eval cost and output width for both targets */ + set_pathtarget_cost_width(root, target); + set_pathtarget_cost_width(root, agg_input); + + result->relids = bms_copy(rel->relids); + result->target = target; + result->agg_input = agg_input; + result->grouped_rows = estimate_num_groups(root, result->group_exprs, + rel->rows, NULL, NULL); + + return result; +} + +/* + * eager_aggregation_possible_for_relation + * Check if it's possible to produce grouped paths for the given relation. + */ +static bool +eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel) +{ + ListCell *lc; + int cur_relid; + + /* + * Check to see if the given relation is in the nullable side of an outer + * join. In this case, we cannot push a partial aggregation down to the + * relation, because the NULL-extended rows produced by the outer join + * would not be available when we perform the partial aggregation, while + * with a non-eager-aggregation plan these rows are available for the + * top-level aggregation. Doing so may result in the rows being grouped + * differently than expected, or produce incorrect values from the + * aggregate functions. + */ + cur_relid = -1; + while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0) + { + RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid); + + if (baserel == NULL) + continue; /* ignore outer joins in rel->relids */ + + if (!bms_is_subset(baserel->nulling_relids, rel->relids)) + return false; + } + + /* + * For now we don't try to support PlaceHolderVars. + */ + foreach(lc, rel->reltarget->exprs) + { + Expr *expr = lfirst(lc); + + if (IsA(expr, PlaceHolderVar)) + return false; + } + + /* Caller should only pass base relations or joins. */ + Assert(rel->reloptkind == RELOPT_BASEREL || + rel->reloptkind == RELOPT_JOINREL); + + /* + * Check if all aggregate expressions can be evaluated on this relation + * level. + */ + foreach(lc, root->agg_clause_list) + { + AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); + + Assert(IsA(ac_info->aggref, Aggref)); + + /* + * Give up if any aggregate needs relations other than the current + * one. + * + * If the aggregate needs the current rel plus anything else, grouping + * the current rel could make some input variables unavailable for the + * higher aggregate and also reduce the number of input rows it + * receives. + * + * If the aggregate does not need the current rel at all, then the + * current rel should not be grouped, as we do not support joining two + * grouped relations. + */ + if (!bms_is_subset(ac_info->agg_eval_at, rel->relids)) + return false; + } + + return true; +} + +/* + * init_grouping_targets + * Initialize the target for grouped paths (target) as well as the target + * for paths that generate input for the grouped paths (agg_input). + * + * We also construct the list of SortGroupClauses and the list of grouping + * expressions for the partial aggregation, and return them in *group_clause + * and *group_exprs. + * + * Return true if the targets could be initialized, false otherwise. + */ +static bool +init_grouping_targets(PlannerInfo *root, RelOptInfo *rel, + PathTarget *target, PathTarget *agg_input, + List **group_clauses, List **group_exprs) +{ + ListCell *lc; + List *possibly_dependent = NIL; + Index maxSortGroupRef; + + /* Identify the max sortgroupref */ + maxSortGroupRef = 0; + foreach(lc, root->processed_tlist) + { + Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref; + + if (ref > maxSortGroupRef) + maxSortGroupRef = ref; + } + + foreach(lc, rel->reltarget->exprs) + { + Expr *expr = (Expr *) lfirst(lc); + Index sortgroupref; + + /* + * Given that PlaceHolderVar currently prevents us from doing eager + * aggregation, the source target cannot contain anything more complex + * than a Var. + */ + Assert(IsA(expr, Var)); + + /* Get the sortgroupref if the expr can act as grouping expression. */ + sortgroupref = get_expression_sortgroupref(root, expr); + if (sortgroupref > 0) + { + SortGroupClause *sgc; + + /* Find the matching SortGroupClause */ + sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause); + Assert(sgc->tleSortGroupRef <= maxSortGroupRef); + + /* + * If the target expression can be used as a grouping key, it + * should be emitted by the grouped paths that have been pushed + * down to this relation level. + */ + add_column_to_pathtarget(target, expr, sortgroupref); + + /* + * ... and it also should be emitted by the input paths. + */ + add_column_to_pathtarget(agg_input, expr, sortgroupref); + + /* + * Record this SortGroupClause and grouping expression. Note that + * this SortGroupClause might have already been recorded. + */ + if (!list_member(*group_clauses, sgc)) + { + *group_clauses = lappend(*group_clauses, sgc); + *group_exprs = lappend(*group_exprs, expr); + } + } + else if (is_var_needed_by_join(root, (Var *) expr, rel)) + { + /* + * The expression is needed for an upper join but is neither in + * the GROUP BY clause nor derivable from it using EC (otherwise, + * it would have already been included in the targets above). We + * need to create a special SortGroupClause for this expression. + */ + SortGroupClause *sgc = makeNode(SortGroupClause); + + /* Initialize the SortGroupClause. */ + sgc->tleSortGroupRef = ++maxSortGroupRef; + get_sort_group_operators((castNode(Var, expr))->vartype, + false, true, false, + &sgc->sortop, &sgc->eqop, NULL, + &sgc->hashable); + + /* This expression should be emitted by the grouped paths */ + add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef); + + /* ... and it also should be emitted by the input paths. */ + add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef); + + /* Record this SortGroupClause and grouping expression */ + *group_clauses = lappend(*group_clauses, sgc); + *group_exprs = lappend(*group_exprs, expr); + } + else if (is_var_in_aggref_only(root, (Var *) expr)) + { + /* + * The expression is referenced by an aggregate function pushed + * down to this relation and does not appear elsewhere in the + * targetlist or havingQual. Add it to 'agg_input' but not to + * 'target'. + */ + add_new_column_to_pathtarget(agg_input, expr); + } + else + { + /* + * The expression may be functionally dependent on other + * expressions in the target, but we cannot verify this until all + * target expressions have been constructed. + */ + possibly_dependent = lappend(possibly_dependent, expr); + } + } + + /* + * Now we can verify whether an expression is functionally dependent on + * others. + */ + foreach(lc, possibly_dependent) + { + Var *tvar; + List *deps = NIL; + RangeTblEntry *rte; + + tvar = lfirst_node(Var, lc); + rte = root->simple_rte_array[tvar->varno]; + + if (check_functional_grouping(rte->relid, tvar->varno, + tvar->varlevelsup, + target->exprs, &deps)) + { + /* + * The expression is functionally dependent on other target + * expressions, so it can be included in the targets. Since it + * will not be used as a grouping key, a sortgroupref is not + * needed for it. + */ + add_new_column_to_pathtarget(target, (Expr *) tvar); + add_new_column_to_pathtarget(agg_input, (Expr *) tvar); + } + else + { + /* + * We may arrive here with a grouping expression that is proven + * redundant by EquivalenceClass processing, such as 't1.a' in the + * query below. + * + * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a, + * t1.b; + * + * For now we just give up in this case. + */ + return false; + } + } + + return true; +} + +/* + * is_var_in_aggref_only + * Check whether the given Var appears in aggregate expressions and not + * elsewhere in the targetlist or havingQual. + */ +static bool +is_var_in_aggref_only(PlannerInfo *root, Var *var) +{ + ListCell *lc; + + /* + * Search the list of aggregate expressions for the Var. + */ + foreach(lc, root->agg_clause_list) + { + AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); + List *vars; + + Assert(IsA(ac_info->aggref, Aggref)); + + if (!bms_is_member(var->varno, ac_info->agg_eval_at)) + continue; + + vars = pull_var_clause((Node *) ac_info->aggref, + PVC_RECURSE_AGGREGATES | + PVC_RECURSE_WINDOWFUNCS | + PVC_RECURSE_PLACEHOLDERS); + + if (list_member(vars, var)) + { + list_free(vars); + break; + } + + list_free(vars); + } + + return (lc != NULL && !list_member(root->tlist_vars, var)); +} + +/* + * is_var_needed_by_join + * Check if the given Var is needed by joins above the current rel. + * + * Consider pushing the aggregate avg(b.y) down to relation b for the following + * query: + * + * SELECT a.i, avg(b.y) + * FROM a JOIN b ON a.j = b.j + * GROUP BY a.i; + * + * Column b.j needs to be used as the grouping key because otherwise it cannot + * find its way to the input of the join expression. + */ +static bool +is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel) +{ + Relids relids; + int attno; + RelOptInfo *baserel; + + /* + * Note that when checking if the Var is needed by joins above, we want to + * exclude cases where the Var is only needed in the final output. So + * include "relation 0" in the check. + */ + relids = bms_copy(rel->relids); + relids = bms_add_member(relids, 0); + + baserel = find_base_rel(root, var->varno); + attno = var->varattno - baserel->min_attr; + + return bms_nonempty_difference(baserel->attr_needed[attno], relids); +} + +/* + * get_expression_sortgroupref + * Return sortgroupref if the given 'expr' can be used as a grouping key in + * grouped paths for base or join relations, or 0 otherwise. + * + * We first check if 'expr' is among the grouping expressions. If it is not, + * we then check if 'expr' is known equal to any of the grouping expressions + * due to equivalence relationships. + */ +static Index +get_expression_sortgroupref(PlannerInfo *root, Expr *expr) +{ + ListCell *lc; + + foreach(lc, root->group_expr_list) + { + GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc); + + Assert(IsA(ge_info->expr, Var)); + + if (equal(ge_info->expr, expr) || + exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr, + ge_info->btree_opfamily)) + { + Assert(ge_info->sortgroupref > 0); + + return ge_info->sortgroupref; + } + } + + /* The expression cannot be used as a grouping key. */ + return 0; +} diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index af227b1f24..2796448056 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -929,6 +929,16 @@ struct config_bool ConfigureNamesBool[] = false, NULL, NULL, NULL }, + { + {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD, + gettext_noop("Enables eager aggregation."), + NULL, + GUC_EXPLAIN + }, + &enable_eager_aggregate, + false, + NULL, NULL, NULL + }, { {"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD, gettext_noop("Enables the planner's use of parallel append plans."), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 667e0dc40a..2e9df56cf4 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -413,6 +413,7 @@ #enable_sort = on #enable_tidscan = on #enable_group_by_reordering = on +#enable_eager_aggregate = off # - Planner Cost Constants - diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 1951ae7c11..815c14c71d 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -387,6 +387,15 @@ struct PlannerInfo /* list of PlaceHolderInfos */ List *placeholder_list; + /* list of AggClauseInfos */ + List *agg_clause_list; + + /* list of GroupExprInfos */ + List *group_expr_list; + + /* list of plain Vars contained in targetlist and havingQual */ + List *tlist_vars; + /* array of PlaceHolderInfos indexed by phid */ struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size)); /* allocated size of array */ @@ -429,6 +438,12 @@ struct PlannerInfo */ RelInfoList upper_rels[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); + /* + * list of grouped-relation RelAggInfos, with one instance per item of the + * upper_rels[UPPERREL_PARTIAL_GROUP_AGG] list. + */ + RelInfoList *agg_info_list; + /* Result tlists chosen by grouping_planner for upper-stage processing */ struct PathTarget *upper_targets[UPPERREL_FINAL + 1] pg_node_attr(read_write_ignore); @@ -1079,6 +1094,56 @@ typedef struct RelOptInfo ((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \ (rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs) +/* + * RelAggInfo + * Information needed to create grouped paths for base and join rels. + * + * "relids" is the set of relation identifiers (RT indexes). + * + * "target" is the output tlist for the grouped paths. + * + * "agg_input" is the output tlist for the paths that provide input to the + * grouped paths. One difference from the reltarget of the non-grouped + * relation is that agg_input has its sortgrouprefs[] initialized. + * + * "grouped_rows" is the estimated number of result tuples of the grouped + * relation. + * + * "group_clauses", "group_exprs" and "group_pathkeys" are lists of + * SortGroupClauses, the corresponding grouping expressions and PathKeys + * respectively. + */ +typedef struct RelAggInfo +{ + pg_node_attr(no_copy_equal, no_read, no_query_jumble) + + NodeTag type; + + /* set of base + OJ relids (rangetable indexes) */ + Relids relids; + + /* + * default result targetlist for Paths scanning this grouped relation; + * list of Vars/Exprs, cost, width + */ + struct PathTarget *target; + + /* + * the targetlist for Paths that provide input to the grouped paths + */ + struct PathTarget *agg_input; + + /* estimated number of result tuples */ + Cardinality grouped_rows; + + /* a list of SortGroupClauses */ + List *group_clauses; + /* a list of grouping expressions */ + List *group_exprs; + /* a list of PathKeys */ + List *group_pathkeys; +} RelAggInfo; + /* * IndexOptInfo * Per-index information for planning/optimization @@ -3147,6 +3212,41 @@ typedef struct MinMaxAggInfo Param *param; } MinMaxAggInfo; +/* + * The aggregate expressions that appear in targetlist and having clauses + */ +typedef struct AggClauseInfo +{ + pg_node_attr(no_read, no_query_jumble) + + NodeTag type; + + /* the Aggref expr */ + Aggref *aggref; + + /* lowest level we can evaluate this aggregate at */ + Relids agg_eval_at; +} AggClauseInfo; + +/* + * The grouping expressions that appear in grouping clauses + */ +typedef struct GroupExprInfo +{ + pg_node_attr(no_read, no_query_jumble) + + NodeTag type; + + /* the represented expression */ + Expr *expr; + + /* the tleSortGroupRef of the corresponding SortGroupClause */ + Index sortgroupref; + + /* btree opfamily defining the ordering */ + Oid btree_opfamily; +} GroupExprInfo; + /* * At runtime, PARAM_EXEC slots are used to pass values around from one plan * node to another. They can be used to pass values down into subqueries (for diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index f00bd55f39..d5282e916b 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -310,10 +310,18 @@ extern void setup_simple_rel_arrays(PlannerInfo *root); extern void expand_planner_arrays(PlannerInfo *root, int add_size); extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent); +extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, int relid, + RelAggInfo **agg_info_p); +extern RelOptInfo *build_grouped_rel(PlannerInfo *root, + RelOptInfo *rel_plain); extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid); extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid); extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid); extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids); +extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel, + RelAggInfo *agg_info); +extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids, + RelAggInfo **agg_info_p); extern RelOptInfo *build_join_rel(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer_rel, @@ -349,4 +357,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root, SpecialJoinInfo *sjinfo, int nappinfos, AppendRelInfo **appinfos); +extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel); #endif /* PATHNODE_H */ diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index 970499c469..9392a27a4d 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -21,6 +21,7 @@ * allpaths.c */ extern PGDLLIMPORT bool enable_geqo; +extern PGDLLIMPORT bool enable_eager_aggregate; extern PGDLLIMPORT int geqo_threshold; extern PGDLLIMPORT int min_parallel_table_scan_size; extern PGDLLIMPORT int min_parallel_index_scan_size; @@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows); extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows); +extern void generate_grouped_paths(PlannerInfo *root, + RelOptInfo *rel_grouped, + RelOptInfo *rel_plain, + RelAggInfo *agg_info); extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages, int max_workers); extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel, diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index aafc173792..cedcd88ebf 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -72,6 +72,7 @@ extern void add_other_rels_to_query(PlannerInfo *root); extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist); extern void add_vars_to_targetlist(PlannerInfo *root, List *vars, Relids where_needed); +extern void setup_eager_aggregation(PlannerInfo *root); extern void find_lateral_references(PlannerInfo *root); extern void create_lateral_join_info(PlannerInfo *root); extern List *deconstruct_jointree(PlannerInfo *root); diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out new file mode 100644 index 0000000000..9f63472eff --- /dev/null +++ b/src/test/regress/expected/eager_aggregate.out @@ -0,0 +1,1308 @@ +-- +-- EAGER AGGREGATION +-- Test we can push aggregation down below join +-- +-- Enable eager aggregation, which by default is disabled. +SET enable_eager_aggregate TO on; +CREATE TABLE eager_agg_t1 (a int, b int, c double precision); +CREATE TABLE eager_agg_t2 (a int, b int, c double precision); +CREATE TABLE eager_agg_t3 (a int, b int, c double precision); +INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i; +ANALYZE eager_agg_t1; +ANALYZE eager_agg_t2; +ANALYZE eager_agg_t3; +-- +-- Test eager aggregation over base rel +-- +-- Perform scan of a table, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c +(18 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 +(9 rows) + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial GroupAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Sort + Output: t2.c, t2.b + Sort Key: t2.b + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.c, t2.b +(21 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 +(9 rows) + +RESET enable_hashagg; +-- +-- Test eager aggregation over join rel +-- +-- Perform join of tables, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg((t2.c + t3.c)) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg((t2.c + t3.c))) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg((t2.c + t3.c)) + Group Key: t2.b + -> Hash Join + Output: t2.c, t2.b, t3.c + Hash Cond: (t3.a = t2.a) + -> Seq Scan on public.eager_agg_t3 t3 + Output: t3.a, t3.b, t3.c + -> Hash + Output: t2.c, t2.b, t2.a + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.c, t2.b, t2.a +(25 rows) + +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 497 + 2 | 499 + 3 | 501 + 4 | 503 + 5 | 505 + 6 | 507 + 7 | 509 + 8 | 511 + 9 | 513 +(9 rows) + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg((t2.c + t3.c)) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg((t2.c + t3.c))) + -> Partial GroupAggregate + Output: t2.b, PARTIAL avg((t2.c + t3.c)) + Group Key: t2.b + -> Sort + Output: t2.c, t2.b, t3.c + Sort Key: t2.b + -> Hash Join + Output: t2.c, t2.b, t3.c + Hash Cond: (t3.a = t2.a) + -> Seq Scan on public.eager_agg_t3 t3 + Output: t3.a, t3.b, t3.c + -> Hash + Output: t2.c, t2.b, t2.a + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.c, t2.b, t2.a +(28 rows) + +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 497 + 2 | 499 + 3 | 501 + 4 | 503 + 5 | 505 + 6 | 507 + 7 | 509 + 8 | 511 + 9 | 513 +(9 rows) + +RESET enable_hashagg; +-- +-- Test that eager aggregation works for outer join +-- +-- Ensure aggregation can be pushed down to the non-nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Hash Right Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c +(18 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 + | 505 +(10 rows) + +-- Ensure aggregation cannot be pushed down to the nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; + QUERY PLAN +------------------------------------------------------------ + Sort + Output: t2.b, (avg(t2.c)) + Sort Key: t2.b + -> HashAggregate + Output: t2.b, avg(t2.c) + Group Key: t2.b + -> Hash Right Join + Output: t2.b, t2.c + Hash Cond: (t2.b = t1.b) + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c + -> Hash + Output: t1.b + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.b +(15 rows) + +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; + b | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 + | +(10 rows) + +-- +-- Test that eager aggregation works for parallel plans +-- +SET parallel_setup_cost=0; +SET parallel_tuple_cost=0; +SET min_parallel_table_scan_size=0; +SET max_parallel_workers_per_gather=4; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +--------------------------------------------------------------------------------- + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Gather Merge + Output: t1.a, (PARTIAL avg(t2.c)) + Workers Planned: 2 + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Parallel Hash Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Parallel Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Parallel Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Parallel Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c +(21 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 +(9 rows) + +RESET parallel_setup_cost; +RESET parallel_tuple_cost; +RESET min_parallel_table_scan_size; +RESET max_parallel_workers_per_gather; +DROP TABLE eager_agg_t1; +DROP TABLE eager_agg_t2; +DROP TABLE eager_agg_t3; +-- +-- Test eager aggregation for partitionwise join +-- +-- Enable partitionwise aggregate, which by default is disabled. +SET enable_partitionwise_aggregate TO true; +-- Enable partitionwise join, which by default is disabled. +SET enable_partitionwise_join TO true; +CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30); +CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y); +CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30); +INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i; +INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i; +ANALYZE eager_agg_tab1; +ANALYZE eager_agg_tab2; +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum(t1.y)), (count(*)) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum(t1.y), count(*) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + Hash Cond: (t2.y = t1.x) + -> Seq Scan on public.eager_agg_tab2_p1 t2 + Output: t2.y + -> Hash + Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*) + Group Key: t1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1 + Output: t1.x, t1.y + -> Finalize HashAggregate + Output: t1_1.x, sum(t1_1.y), count(*) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + Hash Cond: (t2_1.y = t1_1.x) + -> Seq Scan on public.eager_agg_tab2_p2 t2_1 + Output: t2_1.y + -> Hash + Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*) + Group Key: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_1 + Output: t1_1.x, t1_1.y + -> Finalize HashAggregate + Output: t1_2.x, sum(t1_2.y), count(*) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + Hash Cond: (t2_2.y = t1_2.x) + -> Seq Scan on public.eager_agg_tab2_p3 t2_2 + Output: t2_2.y + -> Hash + Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*) + Group Key: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_2 + Output: t1_2.x, t1_2.y +(49 rows) + +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; + x | sum | count +----+------+------- + 0 | 500 | 100 + 6 | 1100 | 100 + 12 | 700 | 100 + 18 | 1300 | 100 + 24 | 900 | 100 +(5 rows) + +-- GROUP BY having other matching key +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t2.y, (sum(t1.y)), (count(*)) + Sort Key: t2.y + -> Append + -> Finalize HashAggregate + Output: t2.y, sum(t1.y), count(*) + Group Key: t2.y + -> Hash Join + Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + Hash Cond: (t2.y = t1.x) + -> Seq Scan on public.eager_agg_tab2_p1 t2 + Output: t2.y + -> Hash + Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*) + Group Key: t1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1 + Output: t1.y, t1.x + -> Finalize HashAggregate + Output: t2_1.y, sum(t1_1.y), count(*) + Group Key: t2_1.y + -> Hash Join + Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + Hash Cond: (t2_1.y = t1_1.x) + -> Seq Scan on public.eager_agg_tab2_p2 t2_1 + Output: t2_1.y + -> Hash + Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*) + Group Key: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_1 + Output: t1_1.y, t1_1.x + -> Finalize HashAggregate + Output: t2_2.y, sum(t1_2.y), count(*) + Group Key: t2_2.y + -> Hash Join + Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + Hash Cond: (t2_2.y = t1_2.x) + -> Seq Scan on public.eager_agg_tab2_p3 t2_2 + Output: t2_2.y + -> Hash + Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*) + Group Key: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_2 + Output: t1_2.y, t1_2.x +(49 rows) + +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; + y | sum | count +----+------+------- + 0 | 500 | 100 + 6 | 1100 | 100 + 12 | 700 | 100 + 18 | 1300 | 100 + 24 | 900 | 100 +(5 rows) + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; + QUERY PLAN +------------------------------------------------------------------------------------------------------------ + Sort + Output: t2.x, (sum(t1.x)), (count(*)) + Sort Key: t2.x + -> Finalize HashAggregate + Output: t2.x, sum(t1.x), count(*) + Group Key: t2.x + Filter: (avg(t1.x) > '10'::numeric) + -> Append + -> Hash Join + Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x)) + Hash Cond: (t2_1.y = t1_1.x) + -> Seq Scan on public.eager_agg_tab2_p1 t2_1 + Output: t2_1.x, t2_1.y + -> Hash + Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x)) + -> Partial HashAggregate + Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x) + Group Key: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1_1 + Output: t1_1.x + -> Hash Join + Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x)) + Hash Cond: (t2_2.y = t1_2.x) + -> Seq Scan on public.eager_agg_tab2_p2 t2_2 + Output: t2_2.x, t2_2.y + -> Hash + Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x)) + -> Partial HashAggregate + Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x) + Group Key: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_2 + Output: t1_2.x + -> Hash Join + Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x)) + Hash Cond: (t2_3.y = t1_3.x) + -> Seq Scan on public.eager_agg_tab2_p3 t2_3 + Output: t2_3.x, t2_3.y + -> Hash + Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x)) + -> Partial HashAggregate + Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x) + Group Key: t1_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_3 + Output: t1_3.x +(44 rows) + +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; + x | sum | count +----+------+------- + 2 | 600 | 50 + 4 | 1200 | 50 + 8 | 900 | 50 + 12 | 600 | 50 + 14 | 1200 | 50 + 18 | 900 | 50 +(6 rows) + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +------------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum((t2.y + t3.y))) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum((t2.y + t3.y)) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum((t2.y + t3.y))) + Hash Cond: (t1.x = t2.x) + -> Seq Scan on public.eager_agg_tab1_p1 t1 + Output: t1.x + -> Hash + Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))) + -> Partial HashAggregate + Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)) + Group Key: t2.x + -> Hash Join + Output: t2.y, t2.x, t3.y, t3.x + Hash Cond: (t2.x = t3.x) + -> Seq Scan on public.eager_agg_tab1_p1 t2 + Output: t2.y, t2.x + -> Hash + Output: t3.y, t3.x + -> Seq Scan on public.eager_agg_tab1_p1 t3 + Output: t3.y, t3.x + -> Finalize HashAggregate + Output: t1_1.x, sum((t2_1.y + t3_1.y)) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab1_p2 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))) + -> Partial HashAggregate + Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)) + Group Key: t2_1.x + -> Hash Join + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab1_p2 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab1_p2 t3_1 + Output: t3_1.y, t3_1.x + -> Finalize HashAggregate + Output: t1_2.x, sum((t2_2.y + t3_2.y)) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab1_p3 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))) + -> Partial HashAggregate + Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)) + Group Key: t2_2.x + -> Hash Join + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab1_p3 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab1_p3 t3_2 + Output: t3_2.y, t3_2.x +(70 rows) + +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + x | sum +----+------- + 0 | 10000 + 2 | 14000 + 4 | 18000 + 6 | 22000 + 8 | 26000 + 10 | 10000 + 12 | 14000 + 14 | 18000 + 16 | 22000 + 18 | 26000 + 20 | 10000 + 22 | 14000 + 24 | 18000 + 26 | 22000 + 28 | 26000 +(15 rows) + +-- partial aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + QUERY PLAN +------------------------------------------------------------------------------------------- + Finalize GroupAggregate + Output: t3.y, sum((t2.y + t3.y)) + Group Key: t3.y + -> Sort + Output: t3.y, (PARTIAL sum((t2.y + t3.y))) + Sort Key: t3.y + -> Append + -> Hash Join + Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))) + Hash Cond: (t2_1.x = t1_1.x) + -> Partial GroupAggregate + Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)) + Group Key: t2_1.x, t3_1.y, t3_1.x + -> Incremental Sort + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Sort Key: t2_1.x, t3_1.y + Presorted Key: t2_1.x + -> Merge Join + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Merge Cond: (t2_1.x = t3_1.x) + -> Sort + Output: t2_1.y, t2_1.x + Sort Key: t2_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t2_1 + Output: t2_1.y, t2_1.x + -> Sort + Output: t3_1.y, t3_1.x + Sort Key: t3_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t3_1 + Output: t3_1.y, t3_1.x + -> Hash + Output: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1_1 + Output: t1_1.x + -> Hash Join + Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))) + Hash Cond: (t2_2.x = t1_2.x) + -> Partial GroupAggregate + Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)) + Group Key: t2_2.x, t3_2.y, t3_2.x + -> Incremental Sort + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Sort Key: t2_2.x, t3_2.y + Presorted Key: t2_2.x + -> Merge Join + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Merge Cond: (t2_2.x = t3_2.x) + -> Sort + Output: t2_2.y, t2_2.x + Sort Key: t2_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t2_2 + Output: t2_2.y, t2_2.x + -> Sort + Output: t3_2.y, t3_2.x + Sort Key: t3_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t3_2 + Output: t3_2.y, t3_2.x + -> Hash + Output: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_2 + Output: t1_2.x + -> Hash Join + Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))) + Hash Cond: (t2_3.x = t1_3.x) + -> Partial GroupAggregate + Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)) + Group Key: t2_3.x, t3_3.y, t3_3.x + -> Incremental Sort + Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x + Sort Key: t2_3.x, t3_3.y + Presorted Key: t2_3.x + -> Merge Join + Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x + Merge Cond: (t2_3.x = t3_3.x) + -> Sort + Output: t2_3.y, t2_3.x + Sort Key: t2_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t2_3 + Output: t2_3.y, t2_3.x + -> Sort + Output: t3_3.y, t3_3.x + Sort Key: t3_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t3_3 + Output: t3_3.y, t3_3.x + -> Hash + Output: t1_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_3 + Output: t1_3.x +(88 rows) + +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + y | sum +----+------- + 0 | 7500 + 2 | 13500 + 4 | 19500 + 6 | 25500 + 8 | 31500 + 10 | 22500 + 12 | 28500 + 14 | 34500 + 16 | 40500 + 18 | 46500 +(10 rows) + +RESET enable_hashagg; +DROP TABLE eager_agg_tab1; +DROP TABLE eager_agg_tab2; +-- +-- Test with multi-level partitioning scheme +-- +CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15); +CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20); +CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25); +CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30); +INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i; +ANALYZE eager_agg_tab_ml; +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum(t2.y)), (count(*)) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum(t2.y), count(*) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*)) + Hash Cond: (t1.x = t2.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1 + Output: t1.x + -> Hash + Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*) + Group Key: t2.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t2 + Output: t2.y, t2.x + -> Finalize HashAggregate + Output: t1_1.x, sum(t2_1.y), count(*) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*) + Group Key: t2_1.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1 + Output: t2_1.y, t2_1.x + -> Finalize HashAggregate + Output: t1_2.x, sum(t2_2.y), count(*) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*) + Group Key: t2_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2 + Output: t2_2.y, t2_2.x + -> Finalize HashAggregate + Output: t1_3.x, sum(t2_3.y), count(*) + Group Key: t1_3.x + -> Hash Join + Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3 + Output: t1_3.x + -> Hash + Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*) + Group Key: t2_3.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3 + Output: t2_3.y, t2_3.x + -> Finalize HashAggregate + Output: t1_4.x, sum(t2_4.y), count(*) + Group Key: t1_4.x + -> Hash Join + Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4 + Output: t1_4.x + -> Hash + Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*) + Group Key: t2_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4 + Output: t2_4.y, t2_4.x +(79 rows) + +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; + x | sum | count +----+-------+------- + 0 | 0 | 1089 + 1 | 1156 | 1156 + 2 | 2312 | 1156 + 3 | 3468 | 1156 + 4 | 4624 | 1156 + 5 | 5780 | 1156 + 6 | 6936 | 1156 + 7 | 8092 | 1156 + 8 | 9248 | 1156 + 9 | 10404 | 1156 + 10 | 11560 | 1156 + 11 | 11979 | 1089 + 12 | 13068 | 1089 + 13 | 14157 | 1089 + 14 | 15246 | 1089 + 15 | 16335 | 1089 + 16 | 17424 | 1089 + 17 | 18513 | 1089 + 18 | 19602 | 1089 + 19 | 20691 | 1089 + 20 | 21780 | 1089 + 21 | 22869 | 1089 + 22 | 23958 | 1089 + 23 | 25047 | 1089 + 24 | 26136 | 1089 + 25 | 27225 | 1089 + 26 | 28314 | 1089 + 27 | 29403 | 1089 + 28 | 30492 | 1089 + 29 | 31581 | 1089 +(30 rows) + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t1.y, (sum(t2.y)), (count(*)) + Sort Key: t1.y + -> Finalize HashAggregate + Output: t1.y, sum(t2.y), count(*) + Group Key: t1.y + -> Append + -> Hash Join + Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1 + Output: t1_1.y, t1_1.x + -> Hash + Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*) + Group Key: t2_1.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash Join + Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2 + Output: t1_2.y, t1_2.x + -> Hash + Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*) + Group Key: t2_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2 + Output: t2_2.y, t2_2.x + -> Hash Join + Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3 + Output: t1_3.y, t1_3.x + -> Hash + Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*) + Group Key: t2_3.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3 + Output: t2_3.y, t2_3.x + -> Hash Join + Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4 + Output: t1_4.y, t1_4.x + -> Hash + Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*) + Group Key: t2_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4 + Output: t2_4.y, t2_4.x + -> Hash Join + Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*)) + Hash Cond: (t1_5.x = t2_5.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5 + Output: t1_5.y, t1_5.x + -> Hash + Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*) + Group Key: t2_5.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5 + Output: t2_5.y, t2_5.x +(67 rows) + +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; + y | sum | count +----+-------+------- + 0 | 0 | 1089 + 1 | 1156 | 1156 + 2 | 2312 | 1156 + 3 | 3468 | 1156 + 4 | 4624 | 1156 + 5 | 5780 | 1156 + 6 | 6936 | 1156 + 7 | 8092 | 1156 + 8 | 9248 | 1156 + 9 | 10404 | 1156 + 10 | 11560 | 1156 + 11 | 11979 | 1089 + 12 | 13068 | 1089 + 13 | 14157 | 1089 + 14 | 15246 | 1089 + 15 | 16335 | 1089 + 16 | 17424 | 1089 + 17 | 18513 | 1089 + 18 | 19602 | 1089 + 19 | 20691 | 1089 + 20 | 21780 | 1089 + 21 | 22869 | 1089 + 22 | 23958 | 1089 + 23 | 25047 | 1089 + 24 | 26136 | 1089 + 25 | 27225 | 1089 + 26 | 28314 | 1089 + 27 | 29403 | 1089 + 28 | 30492 | 1089 + 29 | 31581 | 1089 +(30 rows) + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +---------------------------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum((t2.y + t3.y))), (count(*)) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum((t2.y + t3.y)), count(*) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*)) + Hash Cond: (t1.x = t2.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1 + Output: t1.x + -> Hash + Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*) + Group Key: t2.x + -> Hash Join + Output: t2.y, t2.x, t3.y, t3.x + Hash Cond: (t2.x = t3.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t2 + Output: t2.y, t2.x + -> Hash + Output: t3.y, t3.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t3 + Output: t3.y, t3.x + -> Finalize HashAggregate + Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*) + Group Key: t2_1.x + -> Hash Join + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1 + Output: t3_1.y, t3_1.x + -> Finalize HashAggregate + Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*) + Group Key: t2_2.x + -> Hash Join + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2 + Output: t3_2.y, t3_2.x + -> Finalize HashAggregate + Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*) + Group Key: t1_3.x + -> Hash Join + Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3 + Output: t1_3.x + -> Hash + Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*) + Group Key: t2_3.x + -> Hash Join + Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x + Hash Cond: (t2_3.x = t3_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3 + Output: t2_3.y, t2_3.x + -> Hash + Output: t3_3.y, t3_3.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3 + Output: t3_3.y, t3_3.x + -> Finalize HashAggregate + Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*) + Group Key: t1_4.x + -> Hash Join + Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4 + Output: t1_4.x + -> Hash + Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*) + Group Key: t2_4.x + -> Hash Join + Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x + Hash Cond: (t2_4.x = t3_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4 + Output: t2_4.y, t2_4.x + -> Hash + Output: t3_4.y, t3_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4 + Output: t3_4.y, t3_4.x +(114 rows) + +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + x | sum | count +----+---------+------- + 0 | 0 | 35937 + 1 | 78608 | 39304 + 2 | 157216 | 39304 + 3 | 235824 | 39304 + 4 | 314432 | 39304 + 5 | 393040 | 39304 + 6 | 471648 | 39304 + 7 | 550256 | 39304 + 8 | 628864 | 39304 + 9 | 707472 | 39304 + 10 | 786080 | 39304 + 11 | 790614 | 35937 + 12 | 862488 | 35937 + 13 | 934362 | 35937 + 14 | 1006236 | 35937 + 15 | 1078110 | 35937 + 16 | 1149984 | 35937 + 17 | 1221858 | 35937 + 18 | 1293732 | 35937 + 19 | 1365606 | 35937 + 20 | 1437480 | 35937 + 21 | 1509354 | 35937 + 22 | 1581228 | 35937 + 23 | 1653102 | 35937 + 24 | 1724976 | 35937 + 25 | 1796850 | 35937 + 26 | 1868724 | 35937 + 27 | 1940598 | 35937 + 28 | 2012472 | 35937 + 29 | 2084346 | 35937 +(30 rows) + +-- partial aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + QUERY PLAN +------------------------------------------------------------------------------------------------------------------ + Sort + Output: t3.y, (sum((t2.y + t3.y))), (count(*)) + Sort Key: t3.y + -> Finalize HashAggregate + Output: t3.y, sum((t2.y + t3.y)), count(*) + Group Key: t3.y + -> Append + -> Hash Join + Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*) + Group Key: t2_1.x, t3_1.y, t3_1.x + -> Hash Join + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1 + Output: t3_1.y, t3_1.x + -> Hash Join + Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*) + Group Key: t2_2.x, t3_2.y, t3_2.x + -> Hash Join + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2 + Output: t3_2.y, t3_2.x + -> Hash Join + Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3 + Output: t1_3.x + -> Hash + Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*) + Group Key: t2_3.x, t3_3.y, t3_3.x + -> Hash Join + Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x + Hash Cond: (t2_3.x = t3_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3 + Output: t2_3.y, t2_3.x + -> Hash + Output: t3_3.y, t3_3.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3 + Output: t3_3.y, t3_3.x + -> Hash Join + Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4 + Output: t1_4.x + -> Hash + Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*) + Group Key: t2_4.x, t3_4.y, t3_4.x + -> Hash Join + Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x + Hash Cond: (t2_4.x = t3_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4 + Output: t2_4.y, t2_4.x + -> Hash + Output: t3_4.y, t3_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4 + Output: t3_4.y, t3_4.x + -> Hash Join + Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*)) + Hash Cond: (t1_5.x = t2_5.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5 + Output: t1_5.x + -> Hash + Output: t2_5.x, t3_5.y, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_5.x, t3_5.y, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*) + Group Key: t2_5.x, t3_5.y, t3_5.x + -> Hash Join + Output: t2_5.y, t2_5.x, t3_5.y, t3_5.x + Hash Cond: (t2_5.x = t3_5.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5 + Output: t2_5.y, t2_5.x + -> Hash + Output: t3_5.y, t3_5.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5 + Output: t3_5.y, t3_5.x +(102 rows) + +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + y | sum | count +----+---------+------- + 0 | 0 | 35937 + 1 | 78608 | 39304 + 2 | 157216 | 39304 + 3 | 235824 | 39304 + 4 | 314432 | 39304 + 5 | 393040 | 39304 + 6 | 471648 | 39304 + 7 | 550256 | 39304 + 8 | 628864 | 39304 + 9 | 707472 | 39304 + 10 | 786080 | 39304 + 11 | 790614 | 35937 + 12 | 862488 | 35937 + 13 | 934362 | 35937 + 14 | 1006236 | 35937 + 15 | 1078110 | 35937 + 16 | 1149984 | 35937 + 17 | 1221858 | 35937 + 18 | 1293732 | 35937 + 19 | 1365606 | 35937 + 20 | 1437480 | 35937 + 21 | 1509354 | 35937 + 22 | 1581228 | 35937 + 23 | 1653102 | 35937 + 24 | 1724976 | 35937 + 25 | 1796850 | 35937 + 26 | 1868724 | 35937 + 27 | 1940598 | 35937 + 28 | 2012472 | 35937 + 29 | 2084346 | 35937 +(30 rows) + +DROP TABLE eager_agg_tab_ml; diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out index fad7fc3a7e..1dda69e7c2 100644 --- a/src/test/regress/expected/sysviews.out +++ b/src/test/regress/expected/sysviews.out @@ -150,6 +150,7 @@ select name, setting from pg_settings where name like 'enable%'; --------------------------------+--------- enable_async_append | on enable_bitmapscan | on + enable_eager_aggregate | off enable_gathermerge | on enable_group_by_reordering | on enable_hashagg | on @@ -170,7 +171,7 @@ select name, setting from pg_settings where name like 'enable%'; enable_seqscan | on enable_sort | on enable_tidscan | on -(22 rows) +(23 rows) -- There are always wait event descriptions for various types. InjectionPoint -- may be present or absent, depending on history since last postmaster start. diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 2429ec2bba..d5697e5655 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr # The stats test resets stats, so nothing else needing stats access can be in # this group. # ---------- -test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate +test: partition_merge partition_split partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate # event_trigger depends on create_am and cannot run concurrently with # any test that runs DDL diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql new file mode 100644 index 0000000000..4050e4df44 --- /dev/null +++ b/src/test/regress/sql/eager_aggregate.sql @@ -0,0 +1,192 @@ +-- +-- EAGER AGGREGATION +-- Test we can push aggregation down below join +-- + +-- Enable eager aggregation, which by default is disabled. +SET enable_eager_aggregate TO on; + +CREATE TABLE eager_agg_t1 (a int, b int, c double precision); +CREATE TABLE eager_agg_t2 (a int, b int, c double precision); +CREATE TABLE eager_agg_t3 (a int, b int, c double precision); + +INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i; + +ANALYZE eager_agg_t1; +ANALYZE eager_agg_t2; +ANALYZE eager_agg_t3; + + +-- +-- Test eager aggregation over base rel +-- + +-- Perform scan of a table, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +RESET enable_hashagg; + + +-- +-- Test eager aggregation over join rel +-- + +-- Perform join of tables, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + +RESET enable_hashagg; + + +-- +-- Test that eager aggregation works for outer join +-- + +-- Ensure aggregation can be pushed down to the non-nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +-- Ensure aggregation cannot be pushed down to the nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; + + +-- +-- Test that eager aggregation works for parallel plans +-- + +SET parallel_setup_cost=0; +SET parallel_tuple_cost=0; +SET min_parallel_table_scan_size=0; +SET max_parallel_workers_per_gather=4; + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +RESET parallel_setup_cost; +RESET parallel_tuple_cost; +RESET min_parallel_table_scan_size; +RESET max_parallel_workers_per_gather; + + +DROP TABLE eager_agg_t1; +DROP TABLE eager_agg_t2; +DROP TABLE eager_agg_t3; + + +-- +-- Test eager aggregation for partitionwise join +-- + +-- Enable partitionwise aggregate, which by default is disabled. +SET enable_partitionwise_aggregate TO true; +-- Enable partitionwise join, which by default is disabled. +SET enable_partitionwise_join TO true; + +CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30); +CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y); +CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30); +INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i; +INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i; + +ANALYZE eager_agg_tab1; +ANALYZE eager_agg_tab2; + +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; + +-- GROUP BY having other matching key +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + +-- partial aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; +RESET enable_hashagg; + +DROP TABLE eager_agg_tab1; +DROP TABLE eager_agg_tab2; + + +-- +-- Test with multi-level partitioning scheme +-- +CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15); +CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20); +CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25); +CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30); +INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i; + +ANALYZE eager_agg_tab_ml; + +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + +-- partial aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + +DROP TABLE eager_agg_tab_ml; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 502c748ecd..7eec1281e6 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -41,6 +41,7 @@ AfterTriggersTableData AfterTriggersTransData Agg AggClauseCosts +AggClauseInfo AggInfo AggPath AggSplit @@ -1061,6 +1062,7 @@ GrantTargetType Group GroupByOrdering GroupClause +GroupExprInfo GroupPath GroupPathExtraData GroupResultPath @@ -2371,6 +2373,7 @@ ReindexObjectType ReindexParams ReindexStmt ReindexType +RelAggInfo RelFileLocator RelFileLocatorBackend RelFileNumber @@ -2379,6 +2382,7 @@ RelInfo RelInfoArr RelInfoEntry RelInfoList +RelInfoListInfo RelMapFile RelMapping RelOptInfo -- 2.43.0 ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-08-23 15:59 Robert Haas <[email protected]> parent: Richard Guo <[email protected]> 4 siblings, 1 reply; 30+ messages in thread From: Robert Haas @ 2024-08-23 15:59 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Wed, Aug 21, 2024 at 3:11 AM Richard Guo <[email protected]> wrote: > Attached is the updated version of the patchset that fixes this bug > and includes further code refactoring. Here are some initial, high-level thoughts about this patch set. 1. As far as I can see, there's no real performance testing on this thread. I expect that it's possible to show an arbitrarily large gain for the patch by finding a case where partial aggregation is way better than anything we currently know, but that's not very interesting. What I think would be useful to do is find a corpus of existing queries on an existing data set and try them with and without the patch and see which query plans change and whether they're actually better. For example, maybe TPC-H or the subset of TPC-DS that we can actually run would be a useful starting point. One could then also measure how much the planning time increases with the patch to get a sense of what the overhead of enabling this feature would be. Even if it's disabled by default, people aren't going to want to enable it if it causes planning times to become much longer on many queries for which there is no benefit. 2. I think there might be techniques we could use to limit planning effort at an earlier stage when the approach doesn't appear promising. For example, if the proposed grouping column is already unique, the exercise is pointless (I think). Ideally we'd like to detect that without even creating the grouped_rel. But the proposed grouping column might also be *mostly* unique. For example, consider a table with a million rows and a column 500,000 distinct values. I suspect it will be difficult for partial aggregation to work out to a win in a case like this, because I think that the cost of performing the partial aggregation will not reduce the cost either of the final aggregation or of the intervening join steps by enough to compensate. It would be best to find a way to avoid generating a lot of rels and paths in cases where there's really not much hope of a win. One could, perhaps, imagine going further with this by postponing eager aggregation planning until after regular paths have been built, so that we have good cardinality estimates. Suppose the query joins a single fact table to a series of dimension tables. The final plan thus uses the fact table as the driving table and joins to the dimension tables one by one. Do we really need to consider partial aggregation at every level? Perhaps just where there's been a significant row count reduction since the last time we tried it, but at the next level the row count will increase again? Maybe there are other heuristics we could use in addition or instead. 3. In general, we are quite bad at estimating what will happen to the row count after an aggregation, and we have no real idea what the distribution of values will be. That might be a problem for this patch, because it seems like the decisions we will make about where to perform the partial aggregation might end up being quite random. At the top of the join tree, I'll need to compare directly aggregating the best join path with various paths that involve a finalize aggregation step at the top and a partial aggregation step further down. But my cost estimates and row counts for the partial aggregate steps seem like they will often be quite poor, which means that the plans that use those partial aggregate steps might also be quite poor. Even if they're not, I fear that comparing the cost of those PartialAggregate-Join(s)-FinalizeAggregate paths to the direct Aggregate path will look too much like comparing random numbers. We need to know whether the combination of the FinalizeAggregate step and the PartialAggregate step will be more or less expensive than a plain old Aggregate, but how can we tell that if we don't have accurate cardinality estimates? Thanks for working on this. -- Robert Haas EDB: http://www.enterprisedb.com ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-08-28 03:57 Tender Wang <[email protected]> parent: Richard Guo <[email protected]> 4 siblings, 2 replies; 30+ messages in thread From: Tender Wang @ 2024-08-28 03:57 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Robert Haas <[email protected]> Richard Guo <[email protected]> 于2024年8月21日周三 15:11写道: > On Fri, Aug 16, 2024 at 4:14 PM Richard Guo <[email protected]> > wrote: > > I had a self-review of this patchset and made some refactoring, > > especially to the function that creates the RelAggInfo structure for a > > given relation. While there were no major changes, the code should > > now be simpler. > > I found a bug in v10 patchset: when we generate the GROUP BY clauses > for the partial aggregation that is pushed down to a non-aggregated > relation, we may produce a clause with a tleSortGroupRef that > duplicates one already present in the query's groupClause, which would > cause problems. > > Attached is the updated version of the patchset that fixes this bug > and includes further code refactoring. > Rectenly, I do some benchmark tests, mainly on tpch and tpcds. tpch tests have no plan diff, so I do not continue to test on tpch. tpcds(10GB) tests have 22 plan diff as below: 4.sql, 5.sql, 8.sql,11.sql,19.sql,23.sql,31.sql, 33.sql,39.sql,45.sql,46.sql,47.sql,53.sql, 56.sql,57.sql,60.sql,63.sql,68.sql,74.sql,77.sql,80.sql,89.sql I haven't look all of them. I just pick few simple plan test(e.g. 19.sql, 45.sql). For example, 19.sql, eager agg pushdown doesn't get large gain, but a little performance regress. I will continue to do benchmark on this feature. [1] https://github.com/tenderwg/eager_agg -- Tender Wang ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-08-28 13:00 Robert Haas <[email protected]> parent: Tender Wang <[email protected]> 1 sibling, 2 replies; 30+ messages in thread From: Robert Haas @ 2024-08-28 13:00 UTC (permalink / raw) To: Tender Wang <[email protected]>; +Cc: Richard Guo <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Tue, Aug 27, 2024 at 11:57 PM Tender Wang <[email protected]> wrote: > Rectenly, I do some benchmark tests, mainly on tpch and tpcds. > tpch tests have no plan diff, so I do not continue to test on tpch. Interesting to know. > tpcds(10GB) tests have 22 plan diff as below: > 4.sql, 5.sql, 8.sql,11.sql,19.sql,23.sql,31.sql, 33.sql,39.sql,45.sql,46.sql,47.sql,53.sql, > 56.sql,57.sql,60.sql,63.sql,68.sql,74.sql,77.sql,80.sql,89.sql OK. > I haven't look all of them. I just pick few simple plan test(e.g. 19.sql, 45.sql). > For example, 19.sql, eager agg pushdown doesn't get large gain, but a little > performance regress. Yeah, this is one of the things I was worried about in my previous reply to Richard. It would be worth Richard, or someone, probing into exactly why that's happening. My fear is that we just don't have good enough estimates to make good decisions, but there might well be another explanation. > I will continue to do benchmark on this feature. > > [1] https://github.com/tenderwg/eager_agg Thanks! -- Robert Haas EDB: http://www.enterprisedb.com ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-08-29 02:26 Richard Guo <[email protected]> parent: Robert Haas <[email protected]> 0 siblings, 2 replies; 30+ messages in thread From: Richard Guo @ 2024-08-29 02:26 UTC (permalink / raw) To: Robert Haas <[email protected]>; +Cc: Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Fri, Aug 23, 2024 at 11:59 PM Robert Haas <[email protected]> wrote: > Here are some initial, high-level thoughts about this patch set. Thank you for your review and feedback! It helps a lot in moving this work forward. > 1. As far as I can see, there's no real performance testing on this > thread. I expect that it's possible to show an arbitrarily large gain > for the patch by finding a case where partial aggregation is way > better than anything we currently know, but that's not very > interesting. What I think would be useful to do is find a corpus of > existing queries on an existing data set and try them with and without > the patch and see which query plans change and whether they're > actually better. For example, maybe TPC-H or the subset of TPC-DS that > we can actually run would be a useful starting point. One could then > also measure how much the planning time increases with the patch to > get a sense of what the overhead of enabling this feature would be. > Even if it's disabled by default, people aren't going to want to > enable it if it causes planning times to become much longer on many > queries for which there is no benefit. Right. I haven’t had time to run any benchmarks yet, but that is something I need to do. > 2. I think there might be techniques we could use to limit planning > effort at an earlier stage when the approach doesn't appear promising. > For example, if the proposed grouping column is already unique, the > exercise is pointless (I think). Ideally we'd like to detect that > without even creating the grouped_rel. But the proposed grouping > column might also be *mostly* unique. For example, consider a table > with a million rows and a column 500,000 distinct values. I suspect it > will be difficult for partial aggregation to work out to a win in a > case like this, because I think that the cost of performing the > partial aggregation will not reduce the cost either of the final > aggregation or of the intervening join steps by enough to compensate. > It would be best to find a way to avoid generating a lot of rels and > paths in cases where there's really not much hope of a win. > > One could, perhaps, imagine going further with this by postponing > eager aggregation planning until after regular paths have been built, > so that we have good cardinality estimates. Suppose the query joins a > single fact table to a series of dimension tables. The final plan thus > uses the fact table as the driving table and joins to the dimension > tables one by one. Do we really need to consider partial aggregation > at every level? Perhaps just where there's been a significant row > count reduction since the last time we tried it, but at the next level > the row count will increase again? > > Maybe there are other heuristics we could use in addition or instead. Yeah, one of my concerns with this work is that it can use significantly more CPU time and memory during planning once enabled. It would be great if we have some efficient heuristics to limit the effort. I'll work on that next and see what happens. > 3. In general, we are quite bad at estimating what will happen to the > row count after an aggregation, and we have no real idea what the > distribution of values will be. That might be a problem for this > patch, because it seems like the decisions we will make about where to > perform the partial aggregation might end up being quite random. At > the top of the join tree, I'll need to compare directly aggregating > the best join path with various paths that involve a finalize > aggregation step at the top and a partial aggregation step further > down. But my cost estimates and row counts for the partial aggregate > steps seem like they will often be quite poor, which means that the > plans that use those partial aggregate steps might also be quite poor. > Even if they're not, I fear that comparing the cost of those > PartialAggregate-Join(s)-FinalizeAggregate paths to the direct > Aggregate path will look too much like comparing random numbers. We > need to know whether the combination of the FinalizeAggregate step and > the PartialAggregate step will be more or less expensive than a plain > old Aggregate, but how can we tell that if we don't have accurate > cardinality estimates? Yeah, I'm concerned about this too. In addition to the inaccuracies in aggregation estimates, our estimates for joins are sometimes not very accurate either. All this are likely to result in regressions with eager aggregation in some cases. Currently I don't have a good answer to this problem. Maybe we can run some benchmarks first and investigate the regressions discovered on a case-by-case basis to better understand the specific issues. Thanks Richard ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-08-29 02:29 Richard Guo <[email protected]> parent: Tender Wang <[email protected]> 1 sibling, 0 replies; 30+ messages in thread From: Richard Guo @ 2024-08-29 02:29 UTC (permalink / raw) To: Tender Wang <[email protected]>; +Cc: Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Robert Haas <[email protected]> On Wed, Aug 28, 2024 at 11:57 AM Tender Wang <[email protected]> wrote: > Rectenly, I do some benchmark tests, mainly on tpch and tpcds. > tpch tests have no plan diff, so I do not continue to test on tpch. > tpcds(10GB) tests have 22 plan diff as below: > 4.sql, 5.sql, 8.sql,11.sql,19.sql,23.sql,31.sql, 33.sql,39.sql,45.sql,46.sql,47.sql,53.sql, > 56.sql,57.sql,60.sql,63.sql,68.sql,74.sql,77.sql,80.sql,89.sql > > I haven't look all of them. I just pick few simple plan test(e.g. 19.sql, 45.sql). > For example, 19.sql, eager agg pushdown doesn't get large gain, but a little > performance regress. > > I will continue to do benchmark on this feature. Thank you for running the benchmarks. That really helps a lot. Thanks Richard ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-08-29 02:45 Richard Guo <[email protected]> parent: Robert Haas <[email protected]> 1 sibling, 2 replies; 30+ messages in thread From: Richard Guo @ 2024-08-29 02:45 UTC (permalink / raw) To: Robert Haas <[email protected]>; +Cc: Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Wed, Aug 28, 2024 at 9:01 PM Robert Haas <[email protected]> wrote: > On Tue, Aug 27, 2024 at 11:57 PM Tender Wang <[email protected]> wrote: > > I haven't look all of them. I just pick few simple plan test(e.g. 19.sql, 45.sql). > > For example, 19.sql, eager agg pushdown doesn't get large gain, but a little > > performance regress. > > Yeah, this is one of the things I was worried about in my previous > reply to Richard. It would be worth Richard, or someone, probing into > exactly why that's happening. My fear is that we just don't have good > enough estimates to make good decisions, but there might well be > another explanation. It's great that we have a query to probe into. Your guess is likely correct: it may be caused by poor estimates. Tender, would you please help provide the outputs of EXPLAIN (COSTS ON, ANALYZE) on 19.sql with and without eager aggregation? > > I will continue to do benchmark on this feature. Thanks again for running the benchmarks. Thanks Richard ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-08-29 03:22 Tender Wang <[email protected]> parent: Richard Guo <[email protected]> 1 sibling, 0 replies; 30+ messages in thread From: Tender Wang @ 2024-08-29 03:22 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Robert Haas <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] Richard Guo <[email protected]> 于2024年8月29日周四 10:46写道: > On Wed, Aug 28, 2024 at 9:01 PM Robert Haas <[email protected]> wrote: > > On Tue, Aug 27, 2024 at 11:57 PM Tender Wang <[email protected]> wrote: > > > I haven't look all of them. I just pick few simple plan test(e.g. > 19.sql, 45.sql). > > > For example, 19.sql, eager agg pushdown doesn't get large gain, but a > little > > > performance regress. > > > > Yeah, this is one of the things I was worried about in my previous > > reply to Richard. It would be worth Richard, or someone, probing into > > exactly why that's happening. My fear is that we just don't have good > > enough estimates to make good decisions, but there might well be > > another explanation. > > It's great that we have a query to probe into. Your guess is likely > correct: it may be caused by poor estimates. > > Tender, would you please help provide the outputs of > > EXPLAIN (COSTS ON, ANALYZE) > > on 19.sql with and without eager aggregation? > Yeah, in [1], 19_off.out and 19_on.out are the output of explain(costs off, analyze). I will do EXPLAIN(COSTS ON, ANALYZE) tests and upload them later today. [1] https://github.com/tenderwg/eager_agg -- Tender Wang ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-08-29 03:38 Tender Wang <[email protected]> parent: Richard Guo <[email protected]> 1 sibling, 1 reply; 30+ messages in thread From: Tender Wang @ 2024-08-29 03:38 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Robert Haas <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] Richard Guo <[email protected]> 于2024年8月29日周四 10:46写道: > On Wed, Aug 28, 2024 at 9:01 PM Robert Haas <[email protected]> wrote: > > On Tue, Aug 27, 2024 at 11:57 PM Tender Wang <[email protected]> wrote: > > > I haven't look all of them. I just pick few simple plan test(e.g. > 19.sql, 45.sql). > > > For example, 19.sql, eager agg pushdown doesn't get large gain, but a > little > > > performance regress. > > > > Yeah, this is one of the things I was worried about in my previous > > reply to Richard. It would be worth Richard, or someone, probing into > > exactly why that's happening. My fear is that we just don't have good > > enough estimates to make good decisions, but there might well be > > another explanation. > > It's great that we have a query to probe into. Your guess is likely > correct: it may be caused by poor estimates. > > Tender, would you please help provide the outputs of > > EXPLAIN (COSTS ON, ANALYZE) > > on 19.sql with and without eager aggregation? > > I upload EXPLAIN(COSTS ON, ANALYZE) test to [1]. I ran the same query three times, and I chose the third time result. You can check 19_off_explain.out and 19_on_explain.out. [1] https://github.com/tenderwg/eager_agg -- Tender Wang ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-08-29 12:40 Robert Haas <[email protected]> parent: Richard Guo <[email protected]> 1 sibling, 0 replies; 30+ messages in thread From: Robert Haas @ 2024-08-29 12:40 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Wed, Aug 28, 2024 at 10:26 PM Richard Guo <[email protected]> wrote: > Yeah, I'm concerned about this too. In addition to the inaccuracies > in aggregation estimates, our estimates for joins are sometimes not > very accurate either. All this are likely to result in regressions > with eager aggregation in some cases. Currently I don't have a good > answer to this problem. Maybe we can run some benchmarks first and > investigate the regressions discovered on a case-by-case basis to better > understand the specific issues. While it's true that we can make mistakes during join estimation, I believe aggregate estimation tends to be far worse. -- Robert Haas EDB: http://www.enterprisedb.com ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-08-29 13:02 Robert Haas <[email protected]> parent: Tender Wang <[email protected]> 0 siblings, 0 replies; 30+ messages in thread From: Robert Haas @ 2024-08-29 13:02 UTC (permalink / raw) To: Tender Wang <[email protected]>; +Cc: Richard Guo <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Wed, Aug 28, 2024 at 11:38 PM Tender Wang <[email protected]> wrote: > I upload EXPLAIN(COSTS ON, ANALYZE) test to [1]. > I ran the same query three times, and I chose the third time result. > You can check 19_off_explain.out and 19_on_explain.out. So, in 19_off_explain.out, we got this: -> Finalize GroupAggregate (cost=666986.48..667015.35 rows=187 width=142) (actual time=272.649..334.318 rows=900 loops=1) -> Gather Merge (cost=666986.48..667010.21 rows=187 width=142) (actual time=272.644..333.847 rows=901 loops=1) -> Partial GroupAggregate (cost=665986.46..665988.60 rows=78 width=142) (actual time=266.379..267.476 rows=300 loops=3) -> Sort (cost=665986.46..665986.65 rows=78 width=116) (actual time=266.367..266.583 rows=5081 loops=3) And in 19_on_explan.out, we got this: -> Finalize GroupAggregate (cost=666987.03..666989.77 rows=19 width=142) (actual time=285.018..357.374 rows=900 loops=1) -> Gather Merge (cost=666987.03..666989.25 rows=19 width=142) (actual time=285.000..352.793 rows=15242 loops=1) -> Sort (cost=665987.01..665987.03 rows=8 width=142) (actual time=273.391..273.580 rows=5081 loops=3) -> Nested Loop (cost=665918.00..665986.89 rows=8 width=142) (actual time=252.667..269.719 rows=5081 loops=3) -> Nested Loop (cost=665917.85..665985.43 rows=8 width=157) (actual time=252.656..264.755 rows=5413 loops=3) -> Partial GroupAggregate (cost=665917.43..665920.10 rows=82 width=150) (actual time=252.643..255.627 rows=5413 loops=3) -> Sort (cost=665917.43..665917.64 rows=82 width=124) (actual time=252.636..252.927 rows=5413 loops=3) So, the patch was expected to cause the number of rows passing through the Gather Merge to decrease from 197 to 19, but actually caused the number of rows passing through the Gather Merge to increase from 901 to 15242. When the PartialAggregate was positioned at the top of the join tree, it reduced the number of rows from 5081 to 300; but when it was pushed down below two joins, it didn't reduce the row count at all, and the subsequent two joins reduced it by less than 10%. Now, you could complain about the fact that the Parallel Hash Join isn't well-estimated here, but my question is: why does the planner think that the PartialAggregate should go specifically here? In both plans, the PartialAggregate isn't expected to change the row count. And if that is true, then it's going to be cheapest to do it at the point where the joins have reduced the row count to the minimum value. Here, that would be at the top of the plan tree, where we have only 5081 estimated rows, but instead, the patch chooses to do it as soon as we have all of the grouping columns, when we. still have 5413 rows. I don't understand why that path wins on cost, unless it's just that the paths compare fuzzily the same, in which case it kind of goes to my earlier point about not really having the statistics to know which way is actually going to be better. -- Robert Haas EDB: http://www.enterprisedb.com ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-09-04 03:48 Tender Wang <[email protected]> parent: Richard Guo <[email protected]> 4 siblings, 1 reply; 30+ messages in thread From: Tender Wang @ 2024-09-04 03:48 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] Richard Guo <[email protected]> 于2024年8月21日周三 15:11写道: > On Fri, Aug 16, 2024 at 4:14 PM Richard Guo <[email protected]> > wrote: > > I had a self-review of this patchset and made some refactoring, > > especially to the function that creates the RelAggInfo structure for a > > given relation. While there were no major changes, the code should > > now be simpler. > > I found a bug in v10 patchset: when we generate the GROUP BY clauses > for the partial aggregation that is pushed down to a non-aggregated > relation, we may produce a clause with a tleSortGroupRef that > duplicates one already present in the query's groupClause, which would > cause problems. > > Attached is the updated version of the patchset that fixes this bug > and includes further code refactoring. > The v11-0002 git am failed on HEAD(6c2b5edecc). tender@iZ2ze6la2dizi7df9q3xheZ:/workspace/postgres$ git am v11-0002-Implement-Eager-Aggregation.patch Applying: Implement Eager Aggregation error: patch failed: src/test/regress/parallel_schedule:119 error: src/test/regress/parallel_schedule: patch does not apply Patch failed at 0001 Implement Eager Aggregation hint: Use 'git am --show-current-patch=diff' to see the failed patch When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". -- Thanks, Tender Wang ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-09-05 01:40 Tender Wang <[email protected]> parent: Richard Guo <[email protected]> 4 siblings, 1 reply; 30+ messages in thread From: Tender Wang @ 2024-09-05 01:40 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Robert Haas <[email protected]> Richard Guo <[email protected]> 于2024年8月21日周三 15:11写道: > On Fri, Aug 16, 2024 at 4:14 PM Richard Guo <[email protected]> > wrote: > > I had a self-review of this patchset and made some refactoring, > > especially to the function that creates the RelAggInfo structure for a > > given relation. While there were no major changes, the code should > > now be simpler. > > I found a bug in v10 patchset: when we generate the GROUP BY clauses > for the partial aggregation that is pushed down to a non-aggregated > relation, we may produce a clause with a tleSortGroupRef that > duplicates one already present in the query's groupClause, which would > cause problems. > > Attached is the updated version of the patchset that fixes this bug > and includes further code refactoring. I review the v11 patch set, and here are a few of my thoughts: 1. in setup_eager_aggregation(), before calling create_agg_clause_infos(), it does some checks if eager aggregation is available. Can we move those checks into a function, for example, can_eager_agg(), like can_partial_agg() does? 2. I found that outside of joinrel.c we all use IS_DUMMY_REL, but in joinrel.c, Tom always uses is_dummy_rel(). Other commiters use IS_DUMMY_REL. 3. The attached patch does not consider FDW when creating a path for grouped_rel or grouped_join. Do we need to think about FDW? I haven't finished reviewing the patch set. I will continue to learn this feature. -- Thanks, Tender Wang ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-09-11 02:52 Tender Wang <[email protected]> parent: Richard Guo <[email protected]> 4 siblings, 1 reply; 30+ messages in thread From: Tender Wang @ 2024-09-11 02:52 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Robert Haas <[email protected]> Richard Guo <[email protected]> 于2024年8月21日周三 15:11写道: > On Fri, Aug 16, 2024 at 4:14 PM Richard Guo <[email protected]> > wrote: > > I had a self-review of this patchset and made some refactoring, > > especially to the function that creates the RelAggInfo structure for a > > given relation. While there were no major changes, the code should > > now be simpler. > > I found a bug in v10 patchset: when we generate the GROUP BY clauses > for the partial aggregation that is pushed down to a non-aggregated > relation, we may produce a clause with a tleSortGroupRef that > duplicates one already present in the query's groupClause, which would > cause problems. > > Attached is the updated version of the patchset that fixes this bug > and includes further code refactoring. > > I continue to review the v11 version patches. Here are some my thoughts. 1. In make_one_rel(), we have the below codes: /* * Build grouped base relations for each base rel if possible. */ setup_base_grouped_rels(root); As far as I know, each base rel only has one grouped base relation, if possible. The comments may be changed to "Build a grouped base relation for each base rel if possible." 2. According to the comments of generate_grouped_paths(), we may generate paths for a grouped relation on top of paths of join relation. So the ”rel_plain" argument in generate_grouped_paths() may be confused. "plain" usually means "base rel" . How about Re-naming rel_plain to input_rel? 3. In create_partial_grouping_paths(), The partially_grouped_rel could have been already created due to eager aggregation. If partially_grouped_rel exists, its reltarget has been created. So do we need below logic? /* * Build target list for partial aggregate paths. These paths cannot just * emit the same tlist as regular aggregate paths, because (1) we must * include Vars and Aggrefs needed in HAVING, which might not appear in * the result tlist, and (2) the Aggrefs must be set in partial mode. */ partially_grouped_rel->reltarget = make_partial_grouping_target(root, grouped_rel->reltarget, extra->havingQual); -- Thanks, Tender Wang ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-09-13 07:48 Tender Wang <[email protected]> parent: Tender Wang <[email protected]> 0 siblings, 1 reply; 30+ messages in thread From: Tender Wang @ 2024-09-13 07:48 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] Tender Wang <[email protected]> 于2024年9月4日周三 11:48写道: > > > Richard Guo <[email protected]> 于2024年8月21日周三 15:11写道: > >> On Fri, Aug 16, 2024 at 4:14 PM Richard Guo <[email protected]> >> wrote: >> > I had a self-review of this patchset and made some refactoring, >> > especially to the function that creates the RelAggInfo structure for a >> > given relation. While there were no major changes, the code should >> > now be simpler. >> >> I found a bug in v10 patchset: when we generate the GROUP BY clauses >> for the partial aggregation that is pushed down to a non-aggregated >> relation, we may produce a clause with a tleSortGroupRef that >> duplicates one already present in the query's groupClause, which would >> cause problems. >> >> Attached is the updated version of the patchset that fixes this bug >> and includes further code refactoring. >> > > The v11-0002 git am failed on HEAD(6c2b5edecc). > > tender@iZ2ze6la2dizi7df9q3xheZ:/workspace/postgres$ git am > v11-0002-Implement-Eager-Aggregation.patch > Applying: Implement Eager Aggregation > error: patch failed: src/test/regress/parallel_schedule:119 > error: src/test/regress/parallel_schedule: patch does not apply > Patch failed at 0001 Implement Eager Aggregation > hint: Use 'git am --show-current-patch=diff' to see the failed patch > When you have resolved this problem, run "git am --continue". > If you prefer to skip this patch, run "git am --skip" instead. > To restore the original branch and stop patching, run "git am --abort". > > Since MERGE/SPLIT partition has been reverted, the tests *partition_merge* and *partition_split* should be removed from parallel_schedule. After doing the above, the 0002 patch can be applied. -- Thanks, Tender Wang ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-09-25 03:20 Richard Guo <[email protected]> parent: Robert Haas <[email protected]> 1 sibling, 2 replies; 30+ messages in thread From: Richard Guo @ 2024-09-25 03:20 UTC (permalink / raw) To: Robert Haas <[email protected]>; +Cc: Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Wed, Aug 28, 2024 at 9:01 PM Robert Haas <[email protected]> wrote: > On Tue, Aug 27, 2024 at 11:57 PM Tender Wang <[email protected]> wrote: > > I haven't look all of them. I just pick few simple plan test(e.g. 19.sql, 45.sql). > > For example, 19.sql, eager agg pushdown doesn't get large gain, but a little > > performance regress. > > Yeah, this is one of the things I was worried about in my previous > reply to Richard. It would be worth Richard, or someone, probing into > exactly why that's happening. My fear is that we just don't have good > enough estimates to make good decisions, but there might well be > another explanation. Sorry it takes some time to switch back to this thread. I revisited the part about cost estimates for grouped paths in this patch, and I found a big issue: the row estimate for a join path could be significantly inaccurate if there is a grouped join path beneath it. The reason is that it is very tricky to set the size estimates for a grouped join relation. For a non-grouped join relation, we know that all its paths have the same rowcount estimate (well, in theory). But this is not true for a grouped join relation. Suppose we have a grouped join relation for t1/t2 join. There might be two paths for it: Aggregate -> Join -> Scan on t1 -> Scan on t2 Or Join -> Scan on t1 -> Aggregate -> Scan on t2 These two paths can have very different rowcount estimates, and we have no way of knowing which one to set for this grouped join relation, because we do not know which path would be picked in the final plan. This issue can be illustrated with the query below. create table t (a int, b int, c int); insert into t select i%10, i%10, i%10 from generate_series(1,1000)i; analyze t; set enable_eager_aggregate to on; explain (costs on) select sum(t2.c) from t t1 join t t2 on t1.a = t2.a join t t3 on t2.b = t3.b group by t3.a; QUERY PLAN --------------------------------------------------------------------------------------- Finalize HashAggregate (cost=6840.60..6840.70 rows=10 width=12) Group Key: t3.a -> Nested Loop (cost=1672.00..1840.60 rows=1000000 width=12) Join Filter: (t2.b = t3.b) -> Partial HashAggregate (cost=1672.00..1672.10 rows=10 width=12) Group Key: t2.b -> Hash Join (cost=28.50..1172.00 rows=100000 width=8) Hash Cond: (t1.a = t2.a) -> Seq Scan on t t1 (cost=0.00..16.00 rows=1000 width=4) -> Hash (cost=16.00..16.00 rows=1000 width=12) -> Seq Scan on t t2 (cost=0.00..16.00 rows=1000 width=12) -> Materialize (cost=0.00..21.00 rows=1000 width=8) -> Seq Scan on t t3 (cost=0.00..16.00 rows=1000 width=8) (13 rows) Look at the Nested Loop node: -> Nested Loop (cost=1672.00..1840.60 rows=1000000 width=12) How can a 10-row outer path joining a 1000-row inner path generate 1000000 rows? This is because we are using the plan of the first path described above, and the rowcount estimate of the second path. What a kluge! To address this issue, one solution I’m considering is to recalculate the row count estimate for a grouped join path using its outer and inner paths. While this may seem expensive, it might not be that bad since we will cache the results of the selectivity calculation. In fact, this is already the approach we take for parameterized join paths (see get_parameterized_joinrel_size). Any thoughts on this? Thanks Richard ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-09-25 06:55 Richard Guo <[email protected]> parent: Tender Wang <[email protected]> 0 siblings, 0 replies; 30+ messages in thread From: Richard Guo @ 2024-09-25 06:55 UTC (permalink / raw) To: Tender Wang <[email protected]>; +Cc: Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Robert Haas <[email protected]> On Thu, Sep 5, 2024 at 9:40 AM Tender Wang <[email protected]> wrote: > 1. in setup_eager_aggregation(), before calling create_agg_clause_infos(), it does > some checks if eager aggregation is available. Can we move those checks into a function, > for example, can_eager_agg(), like can_partial_agg() does? We can do this, but I'm not sure this would be better. > 2. I found that outside of joinrel.c we all use IS_DUMMY_REL, but in joinrel.c, Tom always uses > is_dummy_rel(). Other commiters use IS_DUMMY_REL. They are essentially the same: IS_DUMMY_REL() is a macro that wraps is_dummy_rel(). I think they are interchangeable, and I don’t have a preference for which one is better. > 3. The attached patch does not consider FDW when creating a path for grouped_rel or grouped_join. > Do we need to think about FDW? We may add support for foreign relations in the future, but for now, I think we'd better not expand the scope too much until we ensure that everything is working correctly. Thanks Richard ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-09-25 07:02 Richard Guo <[email protected]> parent: Tender Wang <[email protected]> 0 siblings, 1 reply; 30+ messages in thread From: Richard Guo @ 2024-09-25 07:02 UTC (permalink / raw) To: Tender Wang <[email protected]>; +Cc: Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected]; Robert Haas <[email protected]> On Wed, Sep 11, 2024 at 10:52 AM Tender Wang <[email protected]> wrote: > 1. In make_one_rel(), we have the below codes: > /* > * Build grouped base relations for each base rel if possible. > */ > setup_base_grouped_rels(root); > > As far as I know, each base rel only has one grouped base relation, if possible. > The comments may be changed to "Build a grouped base relation for each base rel if possible." Yeah, each base rel has only one grouped rel. However, there is a comment nearby stating 'consider_parallel flags for each base rel', which confuses me about whether it should be singular or plural in this context. Perhaps someone more proficient in English could clarify this. > 2. According to the comments of generate_grouped_paths(), we may generate paths for a grouped > relation on top of paths of join relation. So the ”rel_plain" argument in generate_grouped_paths() may be > confused. "plain" usually means "base rel" . How about Re-naming rel_plain to input_rel? I don't think 'plain relation' necessarily means 'base relation'. In this context I think it can mean 'non-grouped relation'. But maybe I'm wrong. > 3. In create_partial_grouping_paths(), The partially_grouped_rel could have been already created due to eager > aggregation. If partially_grouped_rel exists, its reltarget has been created. So do we need below logic? > > /* > * Build target list for partial aggregate paths. These paths cannot just > * emit the same tlist as regular aggregate paths, because (1) we must > * include Vars and Aggrefs needed in HAVING, which might not appear in > * the result tlist, and (2) the Aggrefs must be set in partial mode. > */ > partially_grouped_rel->reltarget = > make_partial_grouping_target(root, grouped_rel->reltarget, > extra->havingQual); Yeah, maybe we can avoid building the target list here for partially_grouped_rel that is generated by eager aggregation. Thanks Richard ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-09-25 07:12 Richard Guo <[email protected]> parent: Tender Wang <[email protected]> 0 siblings, 0 replies; 30+ messages in thread From: Richard Guo @ 2024-09-25 07:12 UTC (permalink / raw) To: Tender Wang <[email protected]>; +Cc: Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Fri, Sep 13, 2024 at 3:48 PM Tender Wang <[email protected]> wrote: > Since MERGE/SPLIT partition has been reverted, the tests *partition_merge* and *partition_split* should be removed > from parallel_schedule. After doing the above, the 0002 patch can be applied. Yeah, that's what I need to do. Thanks Richard ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-09-27 03:53 Richard Guo <[email protected]> parent: Richard Guo <[email protected]> 1 sibling, 0 replies; 30+ messages in thread From: Richard Guo @ 2024-09-27 03:53 UTC (permalink / raw) To: Robert Haas <[email protected]>; +Cc: Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Wed, Sep 25, 2024 at 11:20 AM Richard Guo <[email protected]> wrote: > Look at the Nested Loop node: > > -> Nested Loop (cost=1672.00..1840.60 rows=1000000 width=12) > > How can a 10-row outer path joining a 1000-row inner path generate > 1000000 rows? This is because we are using the plan of the first path > described above, and the rowcount estimate of the second path. What a > kluge! > > To address this issue, one solution I’m considering is to recalculate > the row count estimate for a grouped join path using its outer and > inner paths. While this may seem expensive, it might not be that bad > since we will cache the results of the selectivity calculation. In > fact, this is already the approach we take for parameterized join > paths (see get_parameterized_joinrel_size). Here is an updated version of this patch that fixes the rowcount estimate issue along this routine. (see set_joinpath_size.) Now the Nested Loop node looks like: -> Nested Loop (cost=1672.00..1840.60 rows=1000 width=12) (actual time=119.685..122.841 rows=1000 loops=1) Its rowcount estimate looks much more sane now. But wait, why are we using nestloop here? My experience suggests that hashjoin typically outperforms nestloop with input paths of this size on this type of dataset. The thing is, the first path (join-then-aggregate one) of the t1/t2 grouped join relation has a much fewer rowcount but more expensive costs: :path.rows 10 :path.disabled_nodes 0 :path.startup_cost 1672 :path.total_cost 1672.1 And the second path (aggregate-then-join one) has cheaper costs but more rows. :jpath.path.rows 10000 :jpath.path.disabled_nodes 0 :jpath.path.startup_cost 25.75 :jpath.path.total_cost 156.75 Both paths have survived the add_path() tournament for this relation, and the second one is selected as the cheapest path by set_cheapest, which mainly uses costs and then pathkeys as the selection criterion. The rowcount estimate is not taken into account, which is reasonable because unparameterized paths for the same relation usually have the same rowcount estimate. And when creating hashjoins, we only consider the cheapest input paths. This is why we are unable to generate a hashjoin with the first path. However, the situation changes with grouped relations, as different paths of a grouped relation can have very different row counts. To cope with this, I modified set_cheapest() to also find the fewest-row unparameterized path if the relation is a grouped relation, and include it in the cheapest_parameterized_paths list. It could be argued that this will increase the overall planning time a lot because it adds one more path to cheapest_parameterized_paths. But in many cases the fewest-row-path is the same path as cheapest_total_path, in which case we do not need to add it again. And now the plan becomes: explain (costs on) select sum(t2.c) from t t1 join t t2 on t1.a = t2.a join t t3 on t2.b = t3.b group by t3.a; QUERY PLAN --------------------------------------------------------------------------------------------- Finalize HashAggregate (cost=1706.97..1707.07 rows=10 width=12) Group Key: t3.a -> Hash Join (cost=1672.22..1701.97 rows=1000 width=12) Hash Cond: (t3.b = t2.b) -> Seq Scan on t t3 (cost=0.00..16.00 rows=1000 width=8) -> Hash (cost=1672.10..1672.10 rows=10 width=12) -> Partial HashAggregate (cost=1672.00..1672.10 rows=10 width=12) Group Key: t2.b -> Hash Join (cost=28.50..1172.00 rows=100000 width=8) Hash Cond: (t1.a = t2.a) -> Seq Scan on t t1 (cost=0.00..16.00 rows=1000 width=4) -> Hash (cost=16.00..16.00 rows=1000 width=12) -> Seq Scan on t t2 (cost=0.00..16.00 rows=1000 width=12) (13 rows) I believe this is the most optimal plan we can find for this query on this dataset. I also made some changes to how grouped relations are stored in this version of the patch. Thanks Richard Attachments: [application/octet-stream] v12-0001-Implement-Eager-Aggregation.patch (172.9K, 2-v12-0001-Implement-Eager-Aggregation.patch) download | inline diff: From 20078e2b09402323302d8a29f412b3e0f46bf014 Mon Sep 17 00:00:00 2001 From: Richard Guo <[email protected]> Date: Tue, 11 Jun 2024 15:59:19 +0900 Subject: [PATCH v12] Implement Eager Aggregation Eager aggregation is a query optimization technique that partially pushes aggregation past a join, and finalizes it once all the relations are joined. Eager aggregation may reduce the number of input rows to the join and thus could result in a better overall plan. A plan with eager aggregation looks like: EXPLAIN (COSTS OFF) SELECT a.i, avg(b.y) FROM a JOIN b ON a.i = b.j GROUP BY a.i; Finalize HashAggregate Group Key: a.i -> Nested Loop -> Partial HashAggregate Group Key: b.j -> Seq Scan on b -> Index Only Scan using a_pkey on a Index Cond: (i = b.j) During the construction of the join tree, we evaluate each base or join relation to determine if eager aggregation can be applied. If feasible, we create a separate RelOptInfo called a "grouped relation" and store it in a dedicated list. Grouped relation paths can be generated in two ways. The first method involves adding sorted and hashed partial aggregation paths on top of the non-grouped paths. To limit planning time, we only consider the cheapest or suitably-sorted non-grouped paths during this phase. Alternatively, grouped paths can be generated by joining a grouped relation with a non-grouped relation. Joining two grouped relations does not seem to be very useful and is currently not supported. For the partial aggregation that is pushed down to a non-aggregated relation, we need to consider all expressions from this relation that are involved in upper join clauses and include them in the grouping keys. This ensures that we have the correct input for the upper joins and that an aggregated row from the partial aggregation matches the other side of the join if and only if each row in the partial group does, which is crucial for maintaining correctness. One restriction is that we cannot push partial aggregation down to a relation that is in the nullable side of an outer join, because the NULL-extended rows produced by the outer join would not be available when we perform the partial aggregation, while with a non-eager-aggregation plan these rows are available for the top-level aggregation. Pushing partial aggregation in this case may result in the rows being grouped differently than expected, or produce incorrect values from the aggregate functions. If we have generated a grouped relation for the topmost join relation, we finalize its paths at the end. The final paths will compete in the usual way with paths built from regular planning. Since eager aggregation can generate many grouped relations, we introduce a RelInfoList structure, which encapsulates both a list and a hash table, so that we can leverage the hash table for faster lookups not only for join relations but also for grouped relations. Eager aggregation can use significantly more CPU time and memory than regular planning when the query involves aggregates and many joining relations. However, in some cases, the resulting plan can be much better, justifying the additional planning effort. All the same, for now, turn this feature off by default. --- contrib/postgres_fdw/postgres_fdw.c | 3 +- src/backend/optimizer/README | 79 + src/backend/optimizer/geqo/geqo_eval.c | 98 +- src/backend/optimizer/path/allpaths.c | 448 +++++- src/backend/optimizer/path/costsize.c | 102 +- src/backend/optimizer/path/joinrels.c | 131 ++ src/backend/optimizer/plan/initsplan.c | 259 ++++ src/backend/optimizer/plan/planmain.c | 17 +- src/backend/optimizer/plan/planner.c | 99 +- src/backend/optimizer/util/appendinfo.c | 60 + src/backend/optimizer/util/pathnode.c | 47 +- src/backend/optimizer/util/relnode.c | 708 ++++++++- src/backend/utils/misc/guc_tables.c | 10 + src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/nodes/pathnodes.h | 142 +- src/include/optimizer/pathnode.h | 7 + src/include/optimizer/paths.h | 5 + src/include/optimizer/planmain.h | 1 + src/test/regress/expected/eager_aggregate.out | 1308 +++++++++++++++++ src/test/regress/expected/sysviews.out | 3 +- src/test/regress/parallel_schedule | 2 +- src/test/regress/sql/eager_aggregate.sql | 192 +++ src/tools/pgindent/typedefs.list | 7 +- 23 files changed, 3572 insertions(+), 157 deletions(-) create mode 100644 src/test/regress/expected/eager_aggregate.out create mode 100644 src/test/regress/sql/eager_aggregate.sql diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index adc62576d1..48b0488184 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -6092,7 +6092,8 @@ foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype, */ Assert(fpinfo->relation_index == 0); /* shouldn't be set yet */ fpinfo->relation_index = - list_length(root->parse->rtable) + list_length(root->join_rel_list); + list_length(root->parse->rtable) + + list_length(root->join_rel_list->items); return true; } diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README index 2ab4f3dbf3..008c700aea 100644 --- a/src/backend/optimizer/README +++ b/src/backend/optimizer/README @@ -1497,3 +1497,82 @@ breaking down aggregation or grouping over a partitioned relation into aggregation or grouping over its partitions is called partitionwise aggregation. Especially when the partition keys match the GROUP BY clause, this can be significantly faster than the regular method. + +Eager aggregation +----------------- + +Eager aggregation is a query optimization technique that partially pushes +aggregation past a join, and finalizes it once all the relations are joined. +Eager aggregation may reduce the number of input rows to the join and thus +could result in a better overall plan. + +For example: + + EXPLAIN (COSTS OFF) + SELECT a.i, avg(b.y) + FROM a JOIN b ON a.i = b.j + GROUP BY a.i; + + Finalize HashAggregate + Group Key: a.i + -> Nested Loop + -> Partial HashAggregate + Group Key: b.j + -> Seq Scan on b + -> Index Only Scan using a_pkey on a + Index Cond: (i = b.j) + +If the partial aggregation on table B significantly reduces the number of +input rows, the join above will be much cheaper, leading to a more efficient +final plan. + +For the partial aggregation that is pushed down to a non-aggregated relation, +we need to consider all expressions from this relation that are involved in +upper join clauses and include them in the grouping keys. This ensures that we +have the correct input for the upper joins and that an aggregated row from the +partial aggregation matches the other side of the join if and only if each row +in the partial group does, which is crucial for maintaining correctness. + +One restriction is that we cannot push partial aggregation down to a relation +that is in the nullable side of an outer join, because the NULL-extended rows +produced by the outer join would not be available when we perform the partial +aggregation, while with a non-eager-aggregation plan these rows are available +for the top-level aggregation. Pushing partial aggregation in this case may +result in the rows being grouped differently than expected, or produce +incorrect values from the aggregate functions. + +We can also apply eager aggregation to a join: + + EXPLAIN (COSTS OFF) + SELECT a.i, avg(b.y + c.z) + FROM a JOIN b ON a.i = b.j + JOIN c ON b.j = c.i + GROUP BY a.i; + + Finalize HashAggregate + Group Key: a.i + -> Nested Loop + -> Partial HashAggregate + Group Key: b.j + -> Hash Join + Hash Cond: (b.j = c.i) + -> Seq Scan on b + -> Hash + -> Seq Scan on c + -> Index Only Scan using a_pkey on a + Index Cond: (i = b.j) + +During the construction of the join tree, we evaluate each base or join +relation to determine if eager aggregation can be applied. If feasible, we +create a separate RelOptInfo called a "grouped relation" and generate grouped +paths by adding sorted and hashed partial aggregation paths on top of the +non-grouped paths. To limit planning time, we consider only the cheapest +non-grouped paths in this step. + +Another way to generate grouped paths is to join a grouped relation with a +non-grouped relation. Joining two grouped relations does not seem to be very +useful and is currently not supported. + +If we have generated a grouped relation for the topmost join relation, we need +to finalize its paths at the end. The final paths will compete in the usual +way with paths built from regular planning. diff --git a/src/backend/optimizer/geqo/geqo_eval.c b/src/backend/optimizer/geqo/geqo_eval.c index d2f7f4e5f3..cdc9543135 100644 --- a/src/backend/optimizer/geqo/geqo_eval.c +++ b/src/backend/optimizer/geqo/geqo_eval.c @@ -39,10 +39,20 @@ typedef struct int size; /* number of input relations in clump */ } Clump; +/* The original length and hashtable of a RelInfoList */ +typedef struct +{ + int savelength; + struct HTAB *savehash; +} RelInfoListInfo; + static List *merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene, bool force); static bool desirable_join(PlannerInfo *root, RelOptInfo *outer_rel, RelOptInfo *inner_rel); +static RelInfoListInfo save_relinfolist(RelInfoList *relinfo_list); +static void restore_relinfolist(RelInfoList *relinfo_list, + RelInfoListInfo *info); /* @@ -60,8 +70,8 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) MemoryContext oldcxt; RelOptInfo *joinrel; Cost fitness; - int savelength; - struct HTAB *savehash; + RelInfoListInfo save_join_rel; + RelInfoListInfo save_grouped_rel; /* * Create a private memory context that will hold all temp storage @@ -78,25 +88,29 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) oldcxt = MemoryContextSwitchTo(mycontext); /* - * gimme_tree will add entries to root->join_rel_list, which may or may - * not already contain some entries. The newly added entries will be - * recycled by the MemoryContextDelete below, so we must ensure that the - * list is restored to its former state before exiting. We can do this by - * truncating the list to its original length. NOTE this assumes that any - * added entries are appended at the end! + * gimme_tree will add entries to root->join_rel_list and + * root->grouped_rel_list, which may or may not already contain some + * entries. The newly added entries will be recycled by the + * MemoryContextDelete below, so we must ensure that each list within the + * RelInfoList structures is restored to its former state before exiting. + * We can do this by truncating each list to its original length. NOTE + * this assumes that any added entries are appended at the end! * - * We also must take care not to mess up the outer join_rel_hash, if there - * is one. We can do this by just temporarily setting the link to NULL. - * (If we are dealing with enough join rels, which we very likely are, a - * new hash table will get built and used locally.) + * We also must take care not to mess up the outer hash tables within the + * RelInfoList structures, if any. We can do this by just temporarily + * setting each link to NULL. (If we are dealing with enough join rels or + * grouped rels, which we very likely are, new hash tables will get built + * and used locally.) * * join_rel_level[] shouldn't be in use, so just Assert it isn't. */ - savelength = list_length(root->join_rel_list); - savehash = root->join_rel_hash; + save_join_rel = save_relinfolist(root->join_rel_list); + save_grouped_rel = save_relinfolist(root->grouped_rel_list); + Assert(root->join_rel_level == NULL); - root->join_rel_hash = NULL; + root->join_rel_list->hash = NULL; + root->grouped_rel_list->hash = NULL; /* construct the best path for the given combination of relations */ joinrel = gimme_tree(root, tour, num_gene); @@ -118,12 +132,11 @@ geqo_eval(PlannerInfo *root, Gene *tour, int num_gene) fitness = DBL_MAX; /* - * Restore join_rel_list to its former state, and put back original - * hashtable if any. + * Restore each of the list in join_rel_list and grouped_rel_list to its + * former state, and put back original hashtables if any. */ - root->join_rel_list = list_truncate(root->join_rel_list, - savelength); - root->join_rel_hash = savehash; + restore_relinfolist(root->join_rel_list, &save_join_rel); + restore_relinfolist(root->grouped_rel_list, &save_grouped_rel); /* release all the memory acquired within gimme_tree */ MemoryContextSwitchTo(oldcxt); @@ -279,6 +292,27 @@ merge_clump(PlannerInfo *root, List *clumps, Clump *new_clump, int num_gene, /* Find and save the cheapest paths for this joinrel */ set_cheapest(joinrel); + /* + * Except for the topmost scan/join rel, consider generating + * partial aggregation paths for the grouped relation on top + * of the paths of this rel. After that, we're done creating + * paths for the grouped relation, so run set_cheapest(). + */ + if (!bms_equal(joinrel->relids, root->all_query_rels)) + { + RelOptInfo *rel_grouped; + + rel_grouped = find_grouped_rel(root, joinrel->relids); + if (rel_grouped) + { + Assert(IS_GROUPED_REL(rel_grouped)); + + generate_grouped_paths(root, rel_grouped, joinrel, + rel_grouped->agg_info); + set_cheapest(rel_grouped); + } + } + /* Absorb new clump into old */ old_clump->joinrel = joinrel; old_clump->size += new_clump->size; @@ -336,3 +370,27 @@ desirable_join(PlannerInfo *root, /* Otherwise postpone the join till later. */ return false; } + +/* + * Save the original length and hashtable of a RelInfoList. + */ +static RelInfoListInfo +save_relinfolist(RelInfoList *relinfo_list) +{ + RelInfoListInfo info; + + info.savelength = list_length(relinfo_list->items); + info.savehash = relinfo_list->hash; + + return info; +} + +/* + * Restore the original length and hashtable of a RelInfoList. + */ +static void +restore_relinfolist(RelInfoList *relinfo_list, RelInfoListInfo *info) +{ + relinfo_list->items = list_truncate(relinfo_list->items, info->savelength); + relinfo_list->hash = info->savehash; +} diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c index 172edb643a..1bd2e63c6f 100644 --- a/src/backend/optimizer/path/allpaths.c +++ b/src/backend/optimizer/path/allpaths.c @@ -40,6 +40,7 @@ #include "optimizer/paths.h" #include "optimizer/plancat.h" #include "optimizer/planner.h" +#include "optimizer/prep.h" #include "optimizer/tlist.h" #include "parser/parse_clause.h" #include "parser/parsetree.h" @@ -47,6 +48,7 @@ #include "port/pg_bitutils.h" #include "rewrite/rewriteManip.h" #include "utils/lsyscache.h" +#include "utils/selfuncs.h" /* Bitmask flags for pushdown_safety_info.unsafeFlags */ @@ -77,6 +79,7 @@ typedef enum pushdown_safe_type /* These parameters are set by GUC */ bool enable_geqo = false; /* just in case GUC doesn't set it */ +bool enable_eager_aggregate = false; int geqo_threshold; int min_parallel_table_scan_size; int min_parallel_index_scan_size; @@ -90,6 +93,7 @@ join_search_hook_type join_search_hook = NULL; static void set_base_rel_consider_startup(PlannerInfo *root); static void set_base_rel_sizes(PlannerInfo *root); +static void setup_base_grouped_rels(PlannerInfo *root); static void set_base_rel_pathlists(PlannerInfo *root); static void set_rel_size(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); @@ -114,6 +118,7 @@ static void set_append_rel_size(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); static void set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, Index rti, RangeTblEntry *rte); +static void set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel); static void generate_orderedappend_paths(PlannerInfo *root, RelOptInfo *rel, List *live_childrels, List *all_child_pathkeys); @@ -182,6 +187,11 @@ make_one_rel(PlannerInfo *root, List *joinlist) */ set_base_rel_sizes(root); + /* + * Build grouped base relations for each base rel if possible. + */ + setup_base_grouped_rels(root); + /* * We should now have size estimates for every actual table involved in * the query, and we also know which if any have been deleted from the @@ -323,6 +333,45 @@ set_base_rel_sizes(PlannerInfo *root) } } +/* + * setup_base_grouped_rels + * For each "plain" base relation, build a grouped base relation if eager + * aggregation is possible and if this relation can produce grouped paths. + */ +static void +setup_base_grouped_rels(PlannerInfo *root) +{ + Index rti; + + /* + * If there are no aggregate expressions or grouping expressions, eager + * aggregation is not possible. + */ + if (root->agg_clause_list == NIL || + root->group_expr_list == NIL) + return; + + for (rti = 1; rti < root->simple_rel_array_size; rti++) + { + RelOptInfo *rel = root->simple_rel_array[rti]; + RelOptInfo *rel_grouped; + + /* there may be empty slots corresponding to non-baserel RTEs */ + if (rel == NULL) + continue; + + Assert(rel->relid == rti); /* sanity check on array */ + Assert(IS_SIMPLE_REL(rel)); /* sanity check on rel */ + + rel_grouped = build_simple_grouped_rel(root, rel); + if (rel_grouped) + { + /* Make the grouped relation available for joining. */ + add_grouped_rel(root, rel_grouped); + } + } +} + /* * set_base_rel_pathlists * Finds all paths available for scanning each base-relation entry. @@ -559,6 +608,15 @@ set_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, /* Now find the cheapest of the paths for this rel */ set_cheapest(rel); + /* + * If a grouped relation for this rel exists, build partial aggregation + * paths for it. + * + * Note that this can only happen after we've called set_cheapest() for + * this base rel, because we need its cheapest paths. + */ + set_grouped_rel_pathlist(root, rel); + #ifdef OPTIMIZER_DEBUG pprint(rel); #endif @@ -1298,6 +1356,36 @@ set_append_rel_pathlist(PlannerInfo *root, RelOptInfo *rel, add_paths_to_append_rel(root, rel, live_childrels); } +/* + * set_grouped_rel_pathlist + * If a grouped relation for the given 'rel' exists, build partial + * aggregation paths for it. + */ +static void +set_grouped_rel_pathlist(PlannerInfo *root, RelOptInfo *rel) +{ + RelOptInfo *rel_grouped; + + /* + * If there are no aggregate expressions or grouping expressions, eager + * aggregation is not possible. + */ + if (root->agg_clause_list == NIL || + root->group_expr_list == NIL) + return; + + /* Add paths to the grouped base relation if one exists. */ + rel_grouped = find_grouped_rel(root, rel->relids); + if (rel_grouped) + { + Assert(IS_GROUPED_REL(rel_grouped)); + + generate_grouped_paths(root, rel_grouped, rel, + rel_grouped->agg_info); + set_cheapest(rel_grouped); + } +} + /* * add_paths_to_append_rel @@ -3306,6 +3394,311 @@ generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_r } } +/* + * generate_grouped_paths + * Generate paths for a grouped relation by adding sorted and hashed + * partial aggregation paths on top of paths of the plain base or join + * relation. + * + * The information needed are provided by the RelAggInfo structure. + */ +void +generate_grouped_paths(PlannerInfo *root, RelOptInfo *rel_grouped, + RelOptInfo *rel_plain, RelAggInfo *agg_info) +{ + AggClauseCosts agg_costs; + bool can_hash; + bool can_sort; + Path *cheapest_total_path = NULL; + Path *cheapest_partial_path = NULL; + double dNumGroups = 0; + double dNumPartialGroups = 0; + + if (IS_DUMMY_REL(rel_plain)) + { + mark_dummy_rel(rel_grouped); + return; + } + + MemSet(&agg_costs, 0, sizeof(AggClauseCosts)); + get_agg_clause_costs(root, AGGSPLIT_INITIAL_SERIAL, &agg_costs); + + /* + * Determine whether it's possible to perform sort-based implementations + * of grouping. + */ + can_sort = grouping_is_sortable(agg_info->group_clauses); + + /* + * Determine whether we should consider hash-based implementations of + * grouping. + */ + Assert(root->numOrderedAggs == 0); + can_hash = (agg_info->group_clauses != NIL && + grouping_is_hashable(agg_info->group_clauses)); + + /* + * Consider whether we should generate partially aggregated non-partial + * paths. We can only do this if we have a non-partial path. + */ + if (rel_plain->pathlist != NIL) + { + cheapest_total_path = rel_plain->cheapest_total_path; + Assert(cheapest_total_path != NULL); + } + + /* + * If parallelism is possible for rel_grouped, then we should consider + * generating partially-grouped partial paths. However, if the plain rel + * has no partial paths, then we can't. + */ + if (rel_grouped->consider_parallel && rel_plain->partial_pathlist != NIL) + { + cheapest_partial_path = linitial(rel_plain->partial_pathlist); + Assert(cheapest_partial_path != NULL); + } + + /* Estimate number of partial groups. */ + if (cheapest_total_path != NULL) + dNumGroups = estimate_num_groups(root, + agg_info->group_exprs, + cheapest_total_path->rows, + NULL, NULL); + if (cheapest_partial_path != NULL) + dNumPartialGroups = estimate_num_groups(root, + agg_info->group_exprs, + cheapest_partial_path->rows, + NULL, NULL); + + if (can_sort && cheapest_total_path != NULL) + { + ListCell *lc; + + /* + * Use any available suitably-sorted path as input, and also consider + * sorting the cheapest-total path. + */ + foreach(lc, rel_plain->pathlist) + { + Path *input_path = (Path *) lfirst(lc); + Path *path; + bool is_sorted; + int presorted_keys; + + /* + * Since the path originates from a non-grouped relation that is + * not aware of eager aggregation, we must ensure that it provides + * the correct input for partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + input_path, + agg_info->agg_input); + + is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys, + path->pathkeys, + &presorted_keys); + if (!is_sorted) + { + /* + * Try at least sorting the cheapest path and also try + * incrementally sorting any path which is partially sorted + * already (no need to deal with paths which have presorted + * keys when incremental sort is disabled unless it's the + * cheapest input path). + */ + if (input_path != cheapest_total_path && + (presorted_keys == 0 || !enable_incremental_sort)) + continue; + + /* + * We've no need to consider both a sort and incremental sort. + * We'll just do a sort if there are no presorted keys and an + * incremental sort when there are presorted keys. + */ + if (presorted_keys == 0 || !enable_incremental_sort) + path = (Path *) create_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + -1.0); + else + path = (Path *) create_incremental_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + presorted_keys, + -1.0); + } + + /* + * qual is NIL because the HAVING clause cannot be evaluated until + * the final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_SORTED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumGroups); + + add_path(rel_grouped, path); + } + } + + if (can_sort && cheapest_partial_path != NULL) + { + ListCell *lc; + + /* Similar to above logic, but for partial paths. */ + foreach(lc, rel_plain->partial_pathlist) + { + Path *input_path = (Path *) lfirst(lc); + Path *path; + bool is_sorted; + int presorted_keys; + + /* + * Since the path originates from a non-grouped relation that is + * not aware of eager aggregation, we must ensure that it provides + * the correct input for partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + input_path, + agg_info->agg_input); + + is_sorted = pathkeys_count_contained_in(agg_info->group_pathkeys, + path->pathkeys, + &presorted_keys); + + if (!is_sorted) + { + /* + * Try at least sorting the cheapest path and also try + * incrementally sorting any path which is partially sorted + * already (no need to deal with paths which have presorted + * keys when incremental sort is disabled unless it's the + * cheapest input path). + */ + if (input_path != cheapest_partial_path && + (presorted_keys == 0 || !enable_incremental_sort)) + continue; + + /* + * We've no need to consider both a sort and incremental sort. + * We'll just do a sort if there are no presorted keys and an + * incremental sort when there are presorted keys. + */ + if (presorted_keys == 0 || !enable_incremental_sort) + path = (Path *) create_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + -1.0); + else + path = (Path *) create_incremental_sort_path(root, + rel_grouped, + path, + agg_info->group_pathkeys, + presorted_keys, + -1.0); + } + + /* + * qual is NIL because the HAVING clause cannot be evaluated until + * the final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_SORTED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumPartialGroups); + + add_partial_path(rel_grouped, path); + } + } + + /* + * Add a partially-grouped HashAgg Path where possible + */ + if (can_hash && cheapest_total_path != NULL) + { + Path *path; + + /* + * Since the path originates from a non-grouped relation that is not + * aware of eager aggregation, we must ensure that it provides the + * correct input for partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + cheapest_total_path, + agg_info->agg_input); + + /* + * qual is NIL because the HAVING clause cannot be evaluated until the + * final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_HASHED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumGroups); + + add_path(rel_grouped, path); + } + + /* + * Now add a partially-grouped HashAgg partial Path where possible + */ + if (can_hash && cheapest_partial_path != NULL) + { + Path *path; + + /* + * Since the path originates from a non-grouped relation that is not + * aware of eager aggregation, we must ensure that it provides the + * correct input for partial aggregation. + */ + path = (Path *) create_projection_path(root, + rel_grouped, + cheapest_partial_path, + agg_info->agg_input); + + /* + * qual is NIL because the HAVING clause cannot be evaluated until the + * final value of the aggregate is known. + */ + path = (Path *) create_agg_path(root, + rel_grouped, + path, + agg_info->target, + AGG_HASHED, + AGGSPLIT_INITIAL_SERIAL, + agg_info->group_clauses, + NIL, + &agg_costs, + dNumPartialGroups); + + add_partial_path(rel_grouped, path); + } +} + /* * make_rel_from_joinlist * Build access paths using a "joinlist" to guide the join path search. @@ -3414,9 +3807,10 @@ make_rel_from_joinlist(PlannerInfo *root, List *joinlist) * needed for these paths need have been instantiated. * * Note to plugin authors: the functions invoked during standard_join_search() - * modify root->join_rel_list and root->join_rel_hash. If you want to do more - * than one join-order search, you'll probably need to save and restore the - * original states of those data structures. See geqo_eval() for an example. + * modify root->join_rel_list->items and root->join_rel_list->hash. If you + * want to do more than one join-order search, you'll probably need to save and + * restore the original states of those data structures. See geqo_eval() for + * an example. */ RelOptInfo * standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) @@ -3465,6 +3859,10 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) * * After that, we're done creating paths for the joinrel, so run * set_cheapest(). + * + * In addition, we also run generate_grouped_paths() for the grouped + * relation of each just-processed joinrel, and run set_cheapest() for + * the grouped relation afterwards. */ foreach(lc, root->join_rel_level[lev]) { @@ -3485,6 +3883,27 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels) /* Find and save the cheapest paths for this rel */ set_cheapest(rel); + /* + * Except for the topmost scan/join rel, consider generating + * partial aggregation paths for the grouped relation on top of + * the paths of this rel. After that, we're done creating paths + * for the grouped relation, so run set_cheapest(). + */ + if (!bms_equal(rel->relids, root->all_query_rels)) + { + RelOptInfo *rel_grouped; + + rel_grouped = find_grouped_rel(root, rel->relids); + if (rel_grouped) + { + Assert(IS_GROUPED_REL(rel_grouped)); + + generate_grouped_paths(root, rel_grouped, rel, + rel_grouped->agg_info); + set_cheapest(rel_grouped); + } + } + #ifdef OPTIMIZER_DEBUG pprint(rel); #endif @@ -4353,6 +4772,29 @@ generate_partitionwise_join_paths(PlannerInfo *root, RelOptInfo *rel) if (IS_DUMMY_REL(child_rel)) continue; + /* + * Except for the topmost scan/join rel, consider generating partial + * aggregation paths for the grouped relation on top of the paths of + * this partitioned child-join. After that, we're done creating paths + * for the grouped relation, so run set_cheapest(). + */ + if (!bms_equal(IS_OTHER_REL(rel) ? + rel->top_parent_relids : rel->relids, + root->all_query_rels)) + { + RelOptInfo *rel_grouped; + + rel_grouped = find_grouped_rel(root, child_rel->relids); + if (rel_grouped) + { + Assert(IS_GROUPED_REL(rel_grouped)); + + generate_grouped_paths(root, rel_grouped, child_rel, + rel_grouped->agg_info); + set_cheapest(rel_grouped); + } + } + #ifdef OPTIMIZER_DEBUG pprint(child_rel); #endif diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c index e1523d15df..3c38ed7843 100644 --- a/src/backend/optimizer/path/costsize.c +++ b/src/backend/optimizer/path/costsize.c @@ -180,6 +180,9 @@ static bool cost_qual_eval_walker(Node *node, cost_qual_eval_context *context); static void get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel, ParamPathInfo *param_info, QualCost *qpqual_cost); +static void set_joinpath_size(PlannerInfo *root, Path *path, + Path *outer_path, Path *inner_path, + SpecialJoinInfo *sjinfo, List *restrict_clauses); static bool has_indexed_join_quals(NestPath *path); static double approx_tuple_count(PlannerInfo *root, JoinPath *path, List *quals); @@ -3370,19 +3373,8 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path, if (inner_path_rows <= 0) inner_path_rows = 1; /* Mark the path with the correct row estimate */ - if (path->jpath.path.param_info) - path->jpath.path.rows = path->jpath.path.param_info->ppi_rows; - else - path->jpath.path.rows = path->jpath.path.parent->rows; - - /* For partial paths, scale row estimate. */ - if (path->jpath.path.parallel_workers > 0) - { - double parallel_divisor = get_parallel_divisor(&path->jpath.path); - - path->jpath.path.rows = - clamp_row_est(path->jpath.path.rows / parallel_divisor); - } + set_joinpath_size(root, &path->jpath.path, outer_path, inner_path, + extra->sjinfo, path->jpath.joinrestrictinfo); /* cost of inner-relation source data (we already dealt with outer rel) */ @@ -3822,19 +3814,8 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path, inner_path_rows = 1; /* Mark the path with the correct row estimate */ - if (path->jpath.path.param_info) - path->jpath.path.rows = path->jpath.path.param_info->ppi_rows; - else - path->jpath.path.rows = path->jpath.path.parent->rows; - - /* For partial paths, scale row estimate. */ - if (path->jpath.path.parallel_workers > 0) - { - double parallel_divisor = get_parallel_divisor(&path->jpath.path); - - path->jpath.path.rows = - clamp_row_est(path->jpath.path.rows / parallel_divisor); - } + set_joinpath_size(root, &path->jpath.path, outer_path, inner_path, + extra->sjinfo, path->jpath.joinrestrictinfo); /* * Compute cost of the mergequals and qpquals (other restriction clauses) @@ -4254,19 +4235,8 @@ final_cost_hashjoin(PlannerInfo *root, HashPath *path, path->jpath.path.disabled_nodes = workspace->disabled_nodes; /* Mark the path with the correct row estimate */ - if (path->jpath.path.param_info) - path->jpath.path.rows = path->jpath.path.param_info->ppi_rows; - else - path->jpath.path.rows = path->jpath.path.parent->rows; - - /* For partial paths, scale row estimate. */ - if (path->jpath.path.parallel_workers > 0) - { - double parallel_divisor = get_parallel_divisor(&path->jpath.path); - - path->jpath.path.rows = - clamp_row_est(path->jpath.path.rows / parallel_divisor); - } + set_joinpath_size(root, &path->jpath.path, outer_path, inner_path, + extra->sjinfo, path->jpath.joinrestrictinfo); /* mark the path with estimated # of batches */ path->num_batches = numbatches; @@ -5014,6 +4984,60 @@ get_restriction_qual_cost(PlannerInfo *root, RelOptInfo *baserel, *qpqual_cost = baserel->baserestrictcost; } +/* + * set_joinpath_size + * Set the correct row estimate for the given join path. + * + * 'path' is the join path under consideration. + * 'outer_path', 'inner_path' are Paths that produce the relations being + * joined. + * 'sjinfo' is any SpecialJoinInfo relevant to this join. + * 'restrict_clauses' lists the join clauses that need to be applied at the + * join node. + * + * Note that for a grouped join relation, its paths could have very different + * rowcount estimates, so we need to calculate the rowcount estimate using the + * the pair of input paths provided. + */ +static void +set_joinpath_size(PlannerInfo *root, Path *path, + Path *outer_path, Path *inner_path, + SpecialJoinInfo *sjinfo, List *restrict_clauses) +{ + if (IS_GROUPED_REL(path->parent)) + { + /* + * Estimate the number of rows of this grouped join path as the sizes + * of the input paths times the selectivity of the clauses that have + * ended up at this join node. + */ + path->rows = calc_joinrel_size_estimate(root, + path->parent, + outer_path->parent, + inner_path->parent, + outer_path->rows, + inner_path->rows, + sjinfo, + restrict_clauses); + } + else if (path->param_info) + path->rows = path->param_info->ppi_rows; + else + path->rows = path->parent->rows; + + /* + * For partial paths, scale row estimate. We can skip this for grouped + * join paths. + */ + if (path->parallel_workers > 0 && !IS_GROUPED_REL(path->parent)) + { + double parallel_divisor = get_parallel_divisor(path); + + path->rows = + clamp_row_est(path->rows / parallel_divisor); + } +} + /* * compute_semi_anti_join_factors diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c index 7db5e30eef..43c9fa9526 100644 --- a/src/backend/optimizer/path/joinrels.c +++ b/src/backend/optimizer/path/joinrels.c @@ -35,6 +35,9 @@ static bool has_legal_joinclause(PlannerInfo *root, RelOptInfo *rel); static bool restriction_is_constant_false(List *restrictlist, RelOptInfo *joinrel, bool only_pushed_down); +static void make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1, + RelOptInfo *rel2, RelOptInfo *joinrel, + SpecialJoinInfo *sjinfo, List *restrictlist); static void populate_joinrel_with_paths(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, RelOptInfo *joinrel, SpecialJoinInfo *sjinfo, List *restrictlist); @@ -771,6 +774,10 @@ make_join_rel(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2) return joinrel; } + /* Build a grouped join relation for 'joinrel' if possible. */ + make_grouped_join_rel(root, rel1, rel2, joinrel, sjinfo, + restrictlist); + /* Add paths to the join relation. */ populate_joinrel_with_paths(root, rel1, rel2, joinrel, sjinfo, restrictlist); @@ -882,6 +889,125 @@ add_outer_joins_to_relids(PlannerInfo *root, Relids input_relids, return input_relids; } +/* + * make_grouped_join_rel + * Build a grouped join relation out of 'joinrel' if eager aggregation is + * possible and the 'joinrel' can produce grouped paths. + * + * We also generate partial aggregation paths for the grouped relation by + * joining the grouped paths of 'rel1' to the plain paths of 'rel2', or by + * joining the grouped paths of 'rel2' to the plain paths of 'rel1'. + */ +static void +make_grouped_join_rel(PlannerInfo *root, RelOptInfo *rel1, + RelOptInfo *rel2, RelOptInfo *joinrel, + SpecialJoinInfo *sjinfo, List *restrictlist) +{ + RelOptInfo *rel_grouped; + RelOptInfo *rel1_grouped; + RelOptInfo *rel2_grouped; + bool rel1_empty; + bool rel2_empty; + + /* + * If there are no aggregate expressions or grouping expressions, eager + * aggregation is not possible. + */ + if (root->agg_clause_list == NIL || + root->group_expr_list == NIL) + return; + + /* + * See if we already have a grouped joinrel for this joinrel. + */ + rel_grouped = find_grouped_rel(root, joinrel->relids); + + /* + * Construct a new RelOptInfo for the grouped join relation if there is no + * existing one. + */ + if (rel_grouped == NULL) + { + RelAggInfo *agg_info = NULL; + + /* + * Prepare the information needed to create grouped paths for this + * join relation. + */ + agg_info = create_rel_agg_info(root, joinrel); + if (agg_info == NULL) + return; + + /* build a grouped relation out of the plain relation */ + rel_grouped = build_grouped_rel(root, joinrel); + rel_grouped->reltarget = agg_info->target; + rel_grouped->rows = agg_info->grouped_rows; + rel_grouped->agg_info = agg_info; + + /* + * Make the grouped relation available for further joining or for + * acting as the upper rel representing the result of partial + * aggregation. + */ + add_grouped_rel(root, rel_grouped); + } + + Assert(IS_GROUPED_REL(rel_grouped)); + + /* We may have already proven this grouped join relation to be dummy. */ + if (IS_DUMMY_REL(rel_grouped)) + return; + + /* Retrieve the grouped relations for the two input rels */ + rel1_grouped = find_grouped_rel(root, rel1->relids); + rel2_grouped = find_grouped_rel(root, rel2->relids); + + rel1_empty = (rel1_grouped == NULL || IS_DUMMY_REL(rel1_grouped)); + rel2_empty = (rel2_grouped == NULL || IS_DUMMY_REL(rel2_grouped)); + + /* Nothing to do if there's no grouped relation. */ + if (rel1_empty && rel2_empty) + return; + + /* + * Joining two grouped relations is currently not supported. Grouping one + * side would alter the occurrence of the other side's aggregate transient + * states in the final aggregation input. While this issue could be + * addressed by adjusting the transient states, it is not deemed + * worthwhile for now. + */ + if (!rel1_empty && !rel2_empty) + return; + + /* Generate partial aggregation paths for the grouped relation */ + if (!rel1_empty) + { + populate_joinrel_with_paths(root, rel1_grouped, rel2, rel_grouped, + sjinfo, restrictlist); + + /* + * It shouldn't happen that we have marked rel1_grouped as dummy in + * populate_joinrel_with_paths due to provably constant-false join + * restrictions, hence we wouldn't end up with a plan that has Aggref + * in non-Agg plan node. + */ + Assert(!IS_DUMMY_REL(rel1_grouped)); + } + else if (!rel2_empty) + { + populate_joinrel_with_paths(root, rel1, rel2_grouped, rel_grouped, + sjinfo, restrictlist); + + /* + * It shouldn't happen that we have marked rel2_grouped as dummy in + * populate_joinrel_with_paths due to provably constant-false join + * restrictions, hence we wouldn't end up with a plan that has Aggref + * in non-Agg plan node. + */ + Assert(!IS_DUMMY_REL(rel2_grouped)); + } +} + /* * populate_joinrel_with_paths * Add paths to the given joinrel for given pair of joining relations. The @@ -1674,6 +1800,11 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2, adjust_child_relids(joinrel->relids, nappinfos, appinfos))); + /* Build a grouped join relation for 'child_joinrel' if possible */ + make_grouped_join_rel(root, child_rel1, child_rel2, + child_joinrel, child_sjinfo, + child_restrictlist); + /* And make paths for the child join */ populate_joinrel_with_paths(root, child_rel1, child_rel2, child_joinrel, child_sjinfo, diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index f3b9821498..ad468d3796 100644 --- a/src/backend/optimizer/plan/initsplan.c +++ b/src/backend/optimizer/plan/initsplan.c @@ -14,6 +14,7 @@ */ #include "postgres.h" +#include "access/nbtree.h" #include "catalog/pg_type.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" @@ -80,6 +81,8 @@ typedef struct JoinTreeItem } JoinTreeItem; +static void create_agg_clause_infos(PlannerInfo *root); +static void create_grouping_expr_infos(PlannerInfo *root); static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel, Index rtindex); static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode, @@ -327,6 +330,262 @@ add_vars_to_targetlist(PlannerInfo *root, List *vars, } } +/* + * setup_eager_aggregation + * Check if eager aggregation is applicable, and if so collect suitable + * aggregate expressions and grouping expressions in the query. + */ +void +setup_eager_aggregation(PlannerInfo *root) +{ + /* + * Don't apply eager aggregation if disabled by user. + */ + if (!enable_eager_aggregate) + return; + + /* + * Don't apply eager aggregation if there are no available GROUP BY + * clauses. + */ + if (!root->processed_groupClause) + return; + + /* + * For now we don't try to support grouping sets. + */ + if (root->parse->groupingSets) + return; + + /* + * For now we don't try to support DISTINCT or ORDER BY aggregates. + */ + if (root->numOrderedAggs > 0) + return; + + /* + * If there are any aggregates that do not support partial mode, or any + * partial aggregates that are non-serializable, do not apply eager + * aggregation. + */ + if (root->hasNonPartialAggs || root->hasNonSerialAggs) + return; + + /* + * We don't try to apply eager aggregation if there are set-returning + * functions in targetlist. + */ + if (root->parse->hasTargetSRFs) + return; + + /* + * Eager aggregation only makes sense if there are multiple base rels in + * the query. + */ + if (bms_membership(root->all_baserels) != BMS_MULTIPLE) + return; + + /* + * Collect aggregate expressions and plain Vars that appear in targetlist + * and havingQual. + */ + create_agg_clause_infos(root); + + /* + * If there are no suitable aggregate expressions, we cannot apply eager + * aggregation. + */ + if (root->agg_clause_list == NIL) + return; + + /* + * Collect grouping expressions that appear in grouping clauses. + */ + create_grouping_expr_infos(root); +} + +/* + * create_agg_clause_infos + * Search the targetlist and havingQual for Aggrefs and plain Vars, and + * create an AggClauseInfo for each Aggref node. + */ +static void +create_agg_clause_infos(PlannerInfo *root) +{ + List *tlist_exprs; + ListCell *lc; + + Assert(root->agg_clause_list == NIL); + Assert(root->tlist_vars == NIL); + + tlist_exprs = pull_var_clause((Node *) root->processed_tlist, + PVC_INCLUDE_AGGREGATES | + PVC_RECURSE_WINDOWFUNCS | + PVC_RECURSE_PLACEHOLDERS); + + /* + * For now we don't try to support GROUPING() expressions. + */ + foreach(lc, tlist_exprs) + { + Expr *expr = (Expr *) lfirst(lc); + + if (IsA(expr, GroupingFunc)) + return; + } + + /* + * Aggregates within the HAVING clause need to be processed in the same + * way as those in the targetlist. Note that HAVING can contain Aggrefs + * but not WindowFuncs. + */ + if (root->parse->havingQual != NULL) + { + List *having_exprs; + + having_exprs = pull_var_clause((Node *) root->parse->havingQual, + PVC_INCLUDE_AGGREGATES | + PVC_RECURSE_PLACEHOLDERS); + if (having_exprs != NIL) + { + tlist_exprs = list_concat(tlist_exprs, having_exprs); + list_free(having_exprs); + } + } + + foreach(lc, tlist_exprs) + { + Expr *expr = (Expr *) lfirst(lc); + Aggref *aggref; + AggClauseInfo *ac_info; + + /* + * collect plain Vars for future reference + */ + if (IsA(expr, Var)) + { + root->tlist_vars = list_append_unique(root->tlist_vars, expr); + continue; + } + + aggref = castNode(Aggref, expr); + + Assert(aggref->aggorder == NIL); + Assert(aggref->aggdistinct == NIL); + + ac_info = makeNode(AggClauseInfo); + ac_info->aggref = aggref; + ac_info->agg_eval_at = pull_varnos(root, (Node *) aggref); + + root->agg_clause_list = + list_append_unique(root->agg_clause_list, ac_info); + } + + list_free(tlist_exprs); +} + +/* + * create_grouping_expr_infos + * Create GroupExprInfo for each expression usable as grouping key. + * + * If any grouping expression is not suitable, we will just return with + * root->group_expr_list being NIL. + */ +static void +create_grouping_expr_infos(PlannerInfo *root) +{ + List *exprs = NIL; + List *sortgrouprefs = NIL; + List *btree_opfamilies = NIL; + ListCell *lc, + *lc1, + *lc2, + *lc3; + + Assert(root->group_expr_list == NIL); + + foreach(lc, root->processed_groupClause) + { + SortGroupClause *sgc = lfirst_node(SortGroupClause, lc); + TargetEntry *tle = get_sortgroupclause_tle(sgc, root->processed_tlist); + TypeCacheEntry *tce; + Oid equalimageproc; + Oid eq_op; + List *eq_opfamilies; + Oid btree_opfamily; + + Assert(tle->ressortgroupref > 0); + + /* + * For now we only support plain Vars as grouping expressions. + */ + if (!IsA(tle->expr, Var)) + return; + + /* + * Eager aggregation is only possible if equality of grouping keys, as + * defined by the equality operator, implies bitwise equality. + * Otherwise, if we put keys with different byte images into the same + * group, we may lose some information that could be needed to + * evaluate upper qual clauses. + * + * For example, the NUMERIC data type is not supported because values + * that fall into the same group according to the equality operator + * (e.g. 0 and 0.0) can have different scale. + */ + tce = lookup_type_cache(exprType((Node *) tle->expr), + TYPECACHE_BTREE_OPFAMILY); + if (!OidIsValid(tce->btree_opf) || + !OidIsValid(tce->btree_opintype)) + return; + + equalimageproc = get_opfamily_proc(tce->btree_opf, + tce->btree_opintype, + tce->btree_opintype, + BTEQUALIMAGE_PROC); + if (!OidIsValid(equalimageproc) || + !DatumGetBool(OidFunctionCall1Coll(equalimageproc, + tce->typcollation, + ObjectIdGetDatum(tce->btree_opintype)))) + return; + + /* + * Get the operator in the btree's opfamily. + */ + eq_op = get_opfamily_member(tce->btree_opf, + tce->btree_opintype, + tce->btree_opintype, + BTEqualStrategyNumber); + if (!OidIsValid(eq_op)) + return; + eq_opfamilies = get_mergejoin_opfamilies(eq_op); + if (!eq_opfamilies) + return; + btree_opfamily = linitial_oid(eq_opfamilies); + + exprs = lappend(exprs, tle->expr); + sortgrouprefs = lappend_int(sortgrouprefs, tle->ressortgroupref); + btree_opfamilies = lappend_oid(btree_opfamilies, btree_opfamily); + } + + /* + * Construct GroupExprInfo for each expression. + */ + forthree(lc1, exprs, lc2, sortgrouprefs, lc3, btree_opfamilies) + { + Expr *expr = (Expr *) lfirst(lc1); + int sortgroupref = lfirst_int(lc2); + Oid btree_opfamily = lfirst_oid(lc3); + GroupExprInfo *ge_info; + + ge_info = makeNode(GroupExprInfo); + ge_info->expr = (Expr *) copyObject(expr); + ge_info->sortgroupref = sortgroupref; + ge_info->btree_opfamily = btree_opfamily; + + root->group_expr_list = lappend(root->group_expr_list, ge_info); + } +} /***************************************************************************** * diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index e17d31a5c3..a8f102beb8 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -64,8 +64,12 @@ query_planner(PlannerInfo *root, * NOTE: append_rel_list was set up by subquery_planner, so do not touch * here. */ - root->join_rel_list = NIL; - root->join_rel_hash = NULL; + root->join_rel_list = makeNode(RelInfoList); + root->join_rel_list->items = NIL; + root->join_rel_list->hash = NULL; + root->grouped_rel_list = makeNode(RelInfoList); + root->grouped_rel_list->items = NIL; + root->grouped_rel_list->hash = NULL; root->join_rel_level = NULL; root->join_cur_level = 0; root->canon_pathkeys = NIL; @@ -76,6 +80,9 @@ query_planner(PlannerInfo *root, root->placeholder_list = NIL; root->placeholder_array = NULL; root->placeholder_array_size = 0; + root->agg_clause_list = NIL; + root->group_expr_list = NIL; + root->tlist_vars = NIL; root->fkey_list = NIL; root->initial_rels = NIL; @@ -257,6 +264,12 @@ query_planner(PlannerInfo *root, */ extract_restriction_or_clauses(root); + /* + * Check if eager aggregation is applicable, and if so, set up + * root->agg_clause_list and root->group_expr_list. + */ + setup_eager_aggregation(root); + /* * Now expand appendrels by adding "otherrels" for their children. We * delay this to the end so that we have as much information as possible diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index d92d43a17e..922cb7a793 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -227,7 +227,6 @@ static void add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, RelOptInfo *partially_grouped_rel, const AggClauseCosts *agg_costs, grouping_sets_data *gd, - double dNumGroups, GroupPathExtraData *extra); static RelOptInfo *create_partial_grouping_paths(PlannerInfo *root, RelOptInfo *grouped_rel, @@ -4075,9 +4074,7 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, GroupPathExtraData *extra, RelOptInfo **partially_grouped_rel_p) { - Path *cheapest_path = input_rel->cheapest_total_path; RelOptInfo *partially_grouped_rel = NULL; - double dNumGroups; PartitionwiseAggregateType patype = PARTITIONWISE_AGGREGATE_NONE; /* @@ -4158,23 +4155,16 @@ create_ordinary_grouping_paths(PlannerInfo *root, RelOptInfo *input_rel, /* Gather any partially grouped partial paths. */ if (partially_grouped_rel && partially_grouped_rel->partial_pathlist) - { gather_grouping_paths(root, partially_grouped_rel); - set_cheapest(partially_grouped_rel); - } - /* - * Estimate number of groups. - */ - dNumGroups = get_number_of_groups(root, - cheapest_path->rows, - gd, - extra->targetList); + /* Now choose the best path(s) for partially_grouped_rel. */ + if (partially_grouped_rel && partially_grouped_rel->pathlist) + set_cheapest(partially_grouped_rel); /* Build final grouping paths */ add_paths_to_grouping_rel(root, input_rel, grouped_rel, partially_grouped_rel, agg_costs, gd, - dNumGroups, extra); + extra); /* Give a helpful error if we failed to find any implementation */ if (grouped_rel->pathlist == NIL) @@ -7074,16 +7064,42 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, RelOptInfo *grouped_rel, RelOptInfo *partially_grouped_rel, const AggClauseCosts *agg_costs, - grouping_sets_data *gd, double dNumGroups, + grouping_sets_data *gd, GroupPathExtraData *extra) { Query *parse = root->parse; Path *cheapest_path = input_rel->cheapest_total_path; + Path *cheapest_partially_grouped_path = NULL; ListCell *lc; bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0; bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0; List *havingQual = (List *) extra->havingQual; AggClauseCosts *agg_final_costs = &extra->agg_final_costs; + double dNumGroups = 0; + double dNumFinalGroups = 0; + + /* + * Estimate number of groups for non-split aggregation. + */ + dNumGroups = get_number_of_groups(root, + cheapest_path->rows, + gd, + extra->targetList); + + if (partially_grouped_rel && partially_grouped_rel->pathlist) + { + cheapest_partially_grouped_path = + partially_grouped_rel->cheapest_total_path; + + /* + * Estimate number of groups for final phase of partial aggregation. + */ + dNumFinalGroups = + get_number_of_groups(root, + cheapest_partially_grouped_path->rows, + gd, + extra->targetList); + } if (can_sort) { @@ -7195,7 +7211,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, path = make_ordered_path(root, grouped_rel, path, - partially_grouped_rel->cheapest_total_path, + cheapest_partially_grouped_path, info->pathkeys); if (path == NULL) @@ -7212,7 +7228,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, info->clauses, havingQual, agg_final_costs, - dNumGroups)); + dNumFinalGroups)); else add_path(grouped_rel, (Path *) create_group_path(root, @@ -7220,7 +7236,7 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, path, info->clauses, havingQual, - dNumGroups)); + dNumFinalGroups)); } } @@ -7262,19 +7278,17 @@ add_paths_to_grouping_rel(PlannerInfo *root, RelOptInfo *input_rel, */ if (partially_grouped_rel && partially_grouped_rel->pathlist) { - Path *path = partially_grouped_rel->cheapest_total_path; - add_path(grouped_rel, (Path *) create_agg_path(root, grouped_rel, - path, + cheapest_partially_grouped_path, grouped_rel->reltarget, AGG_HASHED, AGGSPLIT_FINAL_DESERIAL, root->processed_groupClause, havingQual, agg_final_costs, - dNumGroups)); + dNumFinalGroups)); } } @@ -7324,6 +7338,21 @@ create_partial_grouping_paths(PlannerInfo *root, bool can_hash = (extra->flags & GROUPING_CAN_USE_HASH) != 0; bool can_sort = (extra->flags & GROUPING_CAN_USE_SORT) != 0; + /* + * The partially_grouped_rel could have been already created due to eager + * aggregation. + */ + partially_grouped_rel = find_grouped_rel(root, input_rel->relids); + Assert(enable_eager_aggregate || partially_grouped_rel == NULL); + + /* + * It is possible that the partially_grouped_rel created by eager + * aggregation is dummy. In this case we just set it to NULL. It might + * be created again by the following logic if possible. + */ + if (partially_grouped_rel && IS_DUMMY_REL(partially_grouped_rel)) + partially_grouped_rel = NULL; + /* * Consider whether we should generate partially aggregated non-partial * paths. We can only do this if we have a non-partial path, and only if @@ -7347,19 +7376,27 @@ create_partial_grouping_paths(PlannerInfo *root, * If we can't partially aggregate partial paths, and we can't partially * aggregate non-partial paths, then don't bother creating the new * RelOptInfo at all, unless the caller specified force_rel_creation. + * + * Note that the partially_grouped_rel could have been already created and + * populated with appropriate paths by eager aggregation. */ if (cheapest_total_path == NULL && cheapest_partial_path == NULL && + (partially_grouped_rel == NULL || + partially_grouped_rel->pathlist == NIL) && !force_rel_creation) return NULL; /* * Build a new upper relation to represent the result of partially - * aggregating the rows from the input relation. - */ - partially_grouped_rel = fetch_upper_rel(root, - UPPERREL_PARTIAL_GROUP_AGG, - grouped_rel->relids); + * aggregating the rows from the input relation. The relation may already + * exist due to eager aggregation, in which case we don't need to create + * it. + */ + if (partially_grouped_rel == NULL) + partially_grouped_rel = fetch_upper_rel(root, + UPPERREL_PARTIAL_GROUP_AGG, + grouped_rel->relids); partially_grouped_rel->consider_parallel = grouped_rel->consider_parallel; partially_grouped_rel->reloptkind = grouped_rel->reloptkind; @@ -7368,6 +7405,14 @@ create_partial_grouping_paths(PlannerInfo *root, partially_grouped_rel->useridiscurrent = grouped_rel->useridiscurrent; partially_grouped_rel->fdwroutine = grouped_rel->fdwroutine; + /* + * Partially-grouped partial paths may have been generated by eager + * aggregation. If we find that parallelism is not possible for + * partially_grouped_rel, we need to drop these partial paths. + */ + if (!partially_grouped_rel->consider_parallel) + partially_grouped_rel->partial_pathlist = NIL; + /* * Build target list for partial aggregate paths. These paths cannot just * emit the same tlist as regular aggregate paths, because (1) we must diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index 4989722637..4884d9ddea 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -499,6 +499,66 @@ adjust_appendrel_attrs_mutator(Node *node, return (Node *) newinfo; } + /* + * We have to process RelAggInfo nodes specially. + */ + if (IsA(node, RelAggInfo)) + { + RelAggInfo *oldinfo = (RelAggInfo *) node; + RelAggInfo *newinfo = makeNode(RelAggInfo); + + /* Copy all flat-copiable fields */ + memcpy(newinfo, oldinfo, sizeof(RelAggInfo)); + + newinfo->relids = adjust_child_relids(oldinfo->relids, + context->nappinfos, + context->appinfos); + + newinfo->target = (PathTarget *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->target, + context); + + newinfo->agg_input = (PathTarget *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->agg_input, + context); + + newinfo->group_clauses = (List *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->group_clauses, + context); + + newinfo->group_exprs = (List *) + adjust_appendrel_attrs_mutator((Node *) oldinfo->group_exprs, + context); + + return (Node *) newinfo; + } + + /* + * We have to process PathTarget nodes specially. + */ + if (IsA(node, PathTarget)) + { + PathTarget *oldtarget = (PathTarget *) node; + PathTarget *newtarget = makeNode(PathTarget); + + /* Copy all flat-copiable fields */ + memcpy(newtarget, oldtarget, sizeof(PathTarget)); + + if (oldtarget->sortgrouprefs) + { + Size nbytes = list_length(oldtarget->exprs) * sizeof(Index); + + newtarget->exprs = (List *) + adjust_appendrel_attrs_mutator((Node *) oldtarget->exprs, + context); + + newtarget->sortgrouprefs = (Index *) palloc(nbytes); + memcpy(newtarget->sortgrouprefs, oldtarget->sortgrouprefs, nbytes); + } + + return (Node *) newtarget; + } + /* * NOTE: we do not need to recurse into sublinks, because they should * already have been converted to subplans before we see them. diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index fc97bf6ee2..673e181b32 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -262,6 +262,12 @@ compare_path_costs_fuzzily(Path *path1, Path *path2, double fuzz_factor) * unparameterized path, too, if there is one; the users of that list find * it more convenient if that's included. * + * cheapest_parameterized_paths also always includes the fewest-row + * unparameterized path, if there is one, for grouped relations. Different + * paths of a grouped relation can have very different row counts, and in some + * cases the cheapest-total unparameterized path may not be the one with the + * fewest row. + * * This is normally called only after we've finished constructing the path * list for the rel node. */ @@ -271,6 +277,7 @@ set_cheapest(RelOptInfo *parent_rel) Path *cheapest_startup_path; Path *cheapest_total_path; Path *best_param_path; + Path *fewest_row_path; List *parameterized_paths; ListCell *p; @@ -280,6 +287,7 @@ set_cheapest(RelOptInfo *parent_rel) elog(ERROR, "could not devise a query plan for the given query"); cheapest_startup_path = cheapest_total_path = best_param_path = NULL; + fewest_row_path = NULL; parameterized_paths = NIL; foreach(p, parent_rel->pathlist) @@ -341,6 +349,8 @@ set_cheapest(RelOptInfo *parent_rel) if (cheapest_total_path == NULL) { cheapest_startup_path = cheapest_total_path = path; + if (IS_GROUPED_REL(parent_rel)) + fewest_row_path = path; continue; } @@ -364,6 +374,27 @@ set_cheapest(RelOptInfo *parent_rel) compare_pathkeys(cheapest_total_path->pathkeys, path->pathkeys) == PATHKEYS_BETTER2)) cheapest_total_path = path; + + /* + * Find the fewest-row unparameterized path for a grouped + * relation. If we find two paths of the same row count, try to + * keep the one with the cheaper total cost; if the costs are + * identical, keep the better-sorted one. + */ + if (IS_GROUPED_REL(parent_rel)) + { + if (fewest_row_path->rows > path->rows) + fewest_row_path = path; + else if (fewest_row_path->rows == path->rows) + { + cmp = compare_path_costs(fewest_row_path, path, TOTAL_COST); + if (cmp > 0 || + (cmp == 0 && + compare_pathkeys(fewest_row_path->pathkeys, + path->pathkeys) == PATHKEYS_BETTER2)) + fewest_row_path = path; + } + } } } @@ -371,6 +402,10 @@ set_cheapest(RelOptInfo *parent_rel) if (cheapest_total_path) parameterized_paths = lcons(cheapest_total_path, parameterized_paths); + /* Add fewest-row unparameterized path, if any, to parameterized_paths */ + if (fewest_row_path && fewest_row_path != cheapest_total_path) + parameterized_paths = lcons(fewest_row_path, parameterized_paths); + /* * If there is no unparameterized path, use the best parameterized path as * cheapest_total_path (but not as cheapest_startup_path). @@ -2787,8 +2822,7 @@ create_projection_path(PlannerInfo *root, pathnode->path.pathtype = T_Result; pathnode->path.parent = rel; pathnode->path.pathtarget = target; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe && @@ -3043,8 +3077,7 @@ create_incremental_sort_path(PlannerInfo *root, pathnode->path.parent = rel; /* Sort doesn't project, so use source path's pathtarget */ pathnode->path.pathtarget = subpath->pathtarget; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe; @@ -3091,8 +3124,7 @@ create_sort_path(PlannerInfo *root, pathnode->path.parent = rel; /* Sort doesn't project, so use source path's pathtarget */ pathnode->path.pathtarget = subpath->pathtarget; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe; @@ -3253,8 +3285,7 @@ create_agg_path(PlannerInfo *root, pathnode->path.pathtype = T_Agg; pathnode->path.parent = rel; pathnode->path.pathtarget = target; - /* For now, assume we are above any joins, so no parameterization */ - pathnode->path.param_info = NULL; + pathnode->path.param_info = subpath->param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = rel->consider_parallel && subpath->parallel_safe; diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c index d7266e4cdb..6d357db28c 100644 --- a/src/backend/optimizer/util/relnode.c +++ b/src/backend/optimizer/util/relnode.c @@ -16,6 +16,7 @@ #include <limits.h> +#include "catalog/pg_constraint.h" #include "miscadmin.h" #include "nodes/nodeFuncs.h" #include "optimizer/appendinfo.h" @@ -27,19 +28,26 @@ #include "optimizer/paths.h" #include "optimizer/placeholder.h" #include "optimizer/plancat.h" +#include "optimizer/planner.h" #include "optimizer/restrictinfo.h" #include "optimizer/tlist.h" +#include "parser/parse_oper.h" #include "parser/parse_relation.h" #include "rewrite/rewriteManip.h" #include "utils/hsearch.h" #include "utils/lsyscache.h" +#include "utils/selfuncs.h" -typedef struct JoinHashEntry +/* + * An entry of a hash table that we use to make lookup for RelOptInfo + * structures more efficient. + */ +typedef struct RelHashEntry { - Relids join_relids; /* hash key --- MUST BE FIRST */ - RelOptInfo *join_rel; -} JoinHashEntry; + Relids relids; /* hash key --- MUST BE FIRST */ + RelOptInfo *rel; +} RelHashEntry; static void build_joinrel_tlist(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *input_rel, @@ -83,6 +91,14 @@ static void build_child_join_reltarget(PlannerInfo *root, RelOptInfo *childrel, int nappinfos, AppendRelInfo **appinfos); +static bool eager_aggregation_possible_for_relation(PlannerInfo *root, + RelOptInfo *rel); +static bool init_grouping_targets(PlannerInfo *root, RelOptInfo *rel, + PathTarget *target, PathTarget *agg_input, + List **group_clauses, List **group_exprs); +static bool is_var_in_aggref_only(PlannerInfo *root, Var *var); +static bool is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel); +static Index get_expression_sortgroupref(PlannerInfo *root, Expr *expr); /* @@ -276,6 +292,7 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) rel->joininfo = NIL; rel->has_eclass_joins = false; rel->consider_partitionwise_join = false; /* might get changed later */ + rel->agg_info = NULL; rel->part_scheme = NULL; rel->nparts = -1; rel->boundinfo = NULL; @@ -406,6 +423,92 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent) return rel; } +/* + * build_simple_grouped_rel + * Construct a new RelOptInfo for a grouped base relation out of an existing + * non-grouped base relation. + */ +RelOptInfo * +build_simple_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain) +{ + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + /* + * We should have available aggregate expressions and grouping + * expressions, otherwise we cannot reach here. + */ + Assert(root->agg_clause_list != NIL); + Assert(root->group_expr_list != NIL); + + /* nothing to do for dummy rel */ + if (IS_DUMMY_REL(rel_plain)) + return NULL; + + /* + * Prepare the information needed to create grouped paths for this base + * relation. + */ + agg_info = create_rel_agg_info(root, rel_plain); + if (agg_info == NULL) + return NULL; + + /* build a grouped relation out of the plain relation */ + rel_grouped = build_grouped_rel(root, rel_plain); + rel_grouped->reltarget = agg_info->target; + rel_grouped->rows = agg_info->grouped_rows; + rel_grouped->agg_info = agg_info; + + return rel_grouped; +} + +/* + * build_grouped_rel + * Build a grouped relation by flat copying a plain relation and resetting + * the necessary fields. + */ +RelOptInfo * +build_grouped_rel(PlannerInfo *root, RelOptInfo *rel_plain) +{ + RelOptInfo *rel_grouped; + + rel_grouped = makeNode(RelOptInfo); + memcpy(rel_grouped, rel_plain, sizeof(RelOptInfo)); + + /* + * clear path info + */ + rel_grouped->pathlist = NIL; + rel_grouped->ppilist = NIL; + rel_grouped->partial_pathlist = NIL; + rel_grouped->cheapest_startup_path = NULL; + rel_grouped->cheapest_total_path = NULL; + rel_grouped->cheapest_unique_path = NULL; + rel_grouped->cheapest_parameterized_paths = NIL; + + /* + * clear partition info + */ + rel_grouped->part_scheme = NULL; + rel_grouped->nparts = -1; + rel_grouped->boundinfo = NULL; + rel_grouped->partbounds_merged = false; + rel_grouped->partition_qual = NIL; + rel_grouped->part_rels = NULL; + rel_grouped->live_parts = NULL; + rel_grouped->all_partrels = NULL; + rel_grouped->partexprs = NULL; + rel_grouped->nullable_partexprs = NULL; + rel_grouped->consider_partitionwise_join = false; + + /* + * clear size estimates + */ + rel_grouped->rows = 0; + + return rel_grouped; +} + /* * find_base_rel * Find a base or otherrel relation entry, which must already exist. @@ -479,11 +582,11 @@ find_base_rel_ignore_join(PlannerInfo *root, int relid) } /* - * build_join_rel_hash - * Construct the auxiliary hash table for join relations. + * build_rel_hash + * Construct the auxiliary hash table for relations. */ static void -build_join_rel_hash(PlannerInfo *root) +build_rel_hash(RelInfoList *list) { HTAB *hashtab; HASHCTL hash_ctl; @@ -491,47 +594,46 @@ build_join_rel_hash(PlannerInfo *root) /* Create the hash table */ hash_ctl.keysize = sizeof(Relids); - hash_ctl.entrysize = sizeof(JoinHashEntry); + hash_ctl.entrysize = sizeof(RelHashEntry); hash_ctl.hash = bitmap_hash; hash_ctl.match = bitmap_match; hash_ctl.hcxt = CurrentMemoryContext; - hashtab = hash_create("JoinRelHashTable", + hashtab = hash_create("RelHashTable", 256L, &hash_ctl, HASH_ELEM | HASH_FUNCTION | HASH_COMPARE | HASH_CONTEXT); - /* Insert all the already-existing joinrels */ - foreach(l, root->join_rel_list) + /* Insert all the already-existing RelOptInfos */ + foreach(l, list->items) { RelOptInfo *rel = (RelOptInfo *) lfirst(l); - JoinHashEntry *hentry; + RelHashEntry *hentry; bool found; - hentry = (JoinHashEntry *) hash_search(hashtab, - &(rel->relids), - HASH_ENTER, - &found); + hentry = (RelHashEntry *) hash_search(hashtab, + &(rel->relids), + HASH_ENTER, + &found); Assert(!found); - hentry->join_rel = rel; + hentry->rel = rel; } - root->join_rel_hash = hashtab; + list->hash = hashtab; } /* - * find_join_rel - * Returns relation entry corresponding to 'relids' (a set of RT indexes), - * or NULL if none exists. This is for join relations. + * find_rel_info + * Find a RelOptInfo entry corresponding to 'relids'. */ -RelOptInfo * -find_join_rel(PlannerInfo *root, Relids relids) +static RelOptInfo * +find_rel_info(RelInfoList *list, Relids relids) { /* * Switch to using hash lookup when list grows "too long". The threshold * is arbitrary and is known only here. */ - if (!root->join_rel_hash && list_length(root->join_rel_list) > 32) - build_join_rel_hash(root); + if (!list->hash && list_length(list->items) > 32) + build_rel_hash(list); /* * Use either hashtable lookup or linear search, as appropriate. @@ -541,23 +643,23 @@ find_join_rel(PlannerInfo *root, Relids relids) * so would force relids out of a register and thus probably slow down the * list-search case. */ - if (root->join_rel_hash) + if (list->hash) { Relids hashkey = relids; - JoinHashEntry *hentry; + RelHashEntry *hentry; - hentry = (JoinHashEntry *) hash_search(root->join_rel_hash, - &hashkey, - HASH_FIND, - NULL); + hentry = (RelHashEntry *) hash_search(list->hash, + &hashkey, + HASH_FIND, + NULL); if (hentry) - return hentry->join_rel; + return hentry->rel; } else { ListCell *l; - foreach(l, root->join_rel_list) + foreach(l, list->items) { RelOptInfo *rel = (RelOptInfo *) lfirst(l); @@ -569,6 +671,28 @@ find_join_rel(PlannerInfo *root, Relids relids) return NULL; } +/* + * find_join_rel + * Returns relation entry corresponding to 'relids' (a set of RT indexes), + * or NULL if none exists. This is for join relations. + */ +RelOptInfo * +find_join_rel(PlannerInfo *root, Relids relids) +{ + return find_rel_info(root->join_rel_list, relids); +} + +/* + * find_grouped_rel + * Returns relation entry corresponding to 'relids' (a set of RT indexes), + * or NULL if none exists. This is for grouped relations. + */ +RelOptInfo * +find_grouped_rel(PlannerInfo *root, Relids relids) +{ + return find_rel_info(root->grouped_rel_list, relids); +} + /* * set_foreign_rel_properties * Set up foreign-join fields if outer and inner relation are foreign @@ -619,31 +743,53 @@ set_foreign_rel_properties(RelOptInfo *joinrel, RelOptInfo *outer_rel, } /* - * add_join_rel - * Add given join relation to the list of join relations in the given - * PlannerInfo. Also add it to the auxiliary hashtable if there is one. + * add_rel_info + * Add given relation to the list, and also add it to the auxiliary + * hashtable if there is one. */ static void -add_join_rel(PlannerInfo *root, RelOptInfo *joinrel) +add_rel_info(RelInfoList *list, RelOptInfo *rel) { - /* GEQO requires us to append the new joinrel to the end of the list! */ - root->join_rel_list = lappend(root->join_rel_list, joinrel); + /* GEQO requires us to append the new relation to the end of the list! */ + list->items = lappend(list->items, rel); /* store it into the auxiliary hashtable if there is one. */ - if (root->join_rel_hash) + if (list->hash) { - JoinHashEntry *hentry; + RelHashEntry *hentry; bool found; - hentry = (JoinHashEntry *) hash_search(root->join_rel_hash, - &(joinrel->relids), - HASH_ENTER, - &found); + hentry = (RelHashEntry *) hash_search(list->hash, + &(rel->relids), + HASH_ENTER, + &found); Assert(!found); - hentry->join_rel = joinrel; + hentry->rel = rel; } } +/* + * add_join_rel + * Add given join relation to the list of join relations in the given + * PlannerInfo. + */ +static void +add_join_rel(PlannerInfo *root, RelOptInfo *joinrel) +{ + add_rel_info(root->join_rel_list, joinrel); +} + +/* + * add_grouped_rel + * Add given grouped relation to the list of grouped relations in the + * given PlannerInfo. + */ +void +add_grouped_rel(PlannerInfo *root, RelOptInfo *rel) +{ + add_rel_info(root->grouped_rel_list, rel); +} + /* * build_join_rel * Returns relation entry corresponding to the union of two given rels, @@ -755,6 +901,7 @@ build_join_rel(PlannerInfo *root, joinrel->joininfo = NIL; joinrel->has_eclass_joins = false; joinrel->consider_partitionwise_join = false; /* might get changed later */ + joinrel->agg_info = NULL; joinrel->parent = NULL; joinrel->top_parent = NULL; joinrel->top_parent_relids = NULL; @@ -939,6 +1086,7 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel, joinrel->joininfo = NIL; joinrel->has_eclass_joins = false; joinrel->consider_partitionwise_join = false; /* might get changed later */ + joinrel->agg_info = NULL; joinrel->parent = parent_joinrel; joinrel->top_parent = parent_joinrel->top_parent ? parent_joinrel->top_parent : parent_joinrel; joinrel->top_parent_relids = joinrel->top_parent->relids; @@ -2508,3 +2656,471 @@ build_child_join_reltarget(PlannerInfo *root, childrel->reltarget->cost.per_tuple = parentrel->reltarget->cost.per_tuple; childrel->reltarget->width = parentrel->reltarget->width; } + +/* + * create_rel_agg_info + * Create the RelAggInfo structure for the given relation if it can produce + * grouped paths. The given relation is the non-grouped one which has the + * reltarget already constructed. + */ +RelAggInfo * +create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel) +{ + ListCell *lc; + RelAggInfo *result; + PathTarget *agg_input; + PathTarget *target; + List *group_clauses = NIL; + List *group_exprs = NIL; + + /* + * The lists of aggregate expressions and grouping expressions should have + * been constructed. + */ + Assert(root->agg_clause_list != NIL); + Assert(root->group_expr_list != NIL); + + /* + * If this is a child rel, the grouped rel for its parent rel must have + * been created if it can. So we can just use parent's RelAggInfo if + * there is one, with appropriate variable substitutions. + */ + if (IS_OTHER_REL(rel)) + { + RelOptInfo *rel_grouped; + RelAggInfo *agg_info; + + Assert(!bms_is_empty(rel->top_parent_relids)); + rel_grouped = find_grouped_rel(root, rel->top_parent_relids); + + if (rel_grouped == NULL) + return NULL; + + Assert(IS_GROUPED_REL(rel_grouped)); + /* Must do multi-level transformation */ + agg_info = (RelAggInfo *) + adjust_appendrel_attrs_multilevel(root, + (Node *) rel_grouped->agg_info, + rel, + rel->top_parent); + + agg_info->grouped_rows = + estimate_num_groups(root, agg_info->group_exprs, + rel->rows, NULL, NULL); + + return agg_info; + } + + /* Check if it's possible to produce grouped paths for this relation. */ + if (!eager_aggregation_possible_for_relation(root, rel)) + return NULL; + + /* + * Create targets for the grouped paths and for the input paths of the + * grouped paths. + */ + target = create_empty_pathtarget(); + agg_input = create_empty_pathtarget(); + + /* ... and initialize these targets */ + if (!init_grouping_targets(root, rel, target, agg_input, + &group_clauses, &group_exprs)) + return NULL; + + /* + * Eager aggregation is not applicable if there are no available grouping + * expressions. + */ + if (list_length(group_clauses) == 0) + return NULL; + + /* build the RelAggInfo result */ + result = makeNode(RelAggInfo); + + result->group_clauses = group_clauses; + result->group_exprs = group_exprs; + + /* Calculate pathkeys that represent this grouping requirements */ + result->group_pathkeys = + make_pathkeys_for_sortclauses(root, result->group_clauses, + make_tlist_from_pathtarget(target)); + + /* Add aggregates to the grouping target */ + foreach(lc, root->agg_clause_list) + { + AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); + Aggref *aggref; + + Assert(IsA(ac_info->aggref, Aggref)); + + aggref = (Aggref *) copyObject(ac_info->aggref); + mark_partial_aggref(aggref, AGGSPLIT_INITIAL_SERIAL); + + add_column_to_pathtarget(target, (Expr *) aggref, 0); + } + + /* Set the estimated eval cost and output width for both targets */ + set_pathtarget_cost_width(root, target); + set_pathtarget_cost_width(root, agg_input); + + result->relids = bms_copy(rel->relids); + result->target = target; + result->agg_input = agg_input; + result->grouped_rows = estimate_num_groups(root, result->group_exprs, + rel->rows, NULL, NULL); + + return result; +} + +/* + * eager_aggregation_possible_for_relation + * Check if it's possible to produce grouped paths for the given relation. + */ +static bool +eager_aggregation_possible_for_relation(PlannerInfo *root, RelOptInfo *rel) +{ + ListCell *lc; + int cur_relid; + + /* + * Check to see if the given relation is in the nullable side of an outer + * join. In this case, we cannot push a partial aggregation down to the + * relation, because the NULL-extended rows produced by the outer join + * would not be available when we perform the partial aggregation, while + * with a non-eager-aggregation plan these rows are available for the + * top-level aggregation. Doing so may result in the rows being grouped + * differently than expected, or produce incorrect values from the + * aggregate functions. + */ + cur_relid = -1; + while ((cur_relid = bms_next_member(rel->relids, cur_relid)) >= 0) + { + RelOptInfo *baserel = find_base_rel_ignore_join(root, cur_relid); + + if (baserel == NULL) + continue; /* ignore outer joins in rel->relids */ + + if (!bms_is_subset(baserel->nulling_relids, rel->relids)) + return false; + } + + /* + * For now we don't try to support PlaceHolderVars. + */ + foreach(lc, rel->reltarget->exprs) + { + Expr *expr = lfirst(lc); + + if (IsA(expr, PlaceHolderVar)) + return false; + } + + /* Caller should only pass base relations or joins. */ + Assert(rel->reloptkind == RELOPT_BASEREL || + rel->reloptkind == RELOPT_JOINREL); + + /* + * Check if all aggregate expressions can be evaluated on this relation + * level. + */ + foreach(lc, root->agg_clause_list) + { + AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); + + Assert(IsA(ac_info->aggref, Aggref)); + + /* + * Give up if any aggregate needs relations other than the current + * one. + * + * If the aggregate needs the current rel plus anything else, grouping + * the current rel could make some input variables unavailable for the + * higher aggregate and also reduce the number of input rows it + * receives. + * + * If the aggregate does not need the current rel at all, then the + * current rel should not be grouped, as we do not support joining two + * grouped relations. + */ + if (!bms_is_subset(ac_info->agg_eval_at, rel->relids)) + return false; + } + + return true; +} + +/* + * init_grouping_targets + * Initialize the target for grouped paths (target) as well as the target + * for paths that generate input for the grouped paths (agg_input). + * + * We also construct the list of SortGroupClauses and the list of grouping + * expressions for the partial aggregation, and return them in *group_clause + * and *group_exprs. + * + * Return true if the targets could be initialized, false otherwise. + */ +static bool +init_grouping_targets(PlannerInfo *root, RelOptInfo *rel, + PathTarget *target, PathTarget *agg_input, + List **group_clauses, List **group_exprs) +{ + ListCell *lc; + List *possibly_dependent = NIL; + Index maxSortGroupRef; + + /* Identify the max sortgroupref */ + maxSortGroupRef = 0; + foreach(lc, root->processed_tlist) + { + Index ref = ((TargetEntry *) lfirst(lc))->ressortgroupref; + + if (ref > maxSortGroupRef) + maxSortGroupRef = ref; + } + + foreach(lc, rel->reltarget->exprs) + { + Expr *expr = (Expr *) lfirst(lc); + Index sortgroupref; + + /* + * Given that PlaceHolderVar currently prevents us from doing eager + * aggregation, the source target cannot contain anything more complex + * than a Var. + */ + Assert(IsA(expr, Var)); + + /* Get the sortgroupref if the expr can act as grouping expression. */ + sortgroupref = get_expression_sortgroupref(root, expr); + if (sortgroupref > 0) + { + SortGroupClause *sgc; + + /* Find the matching SortGroupClause */ + sgc = get_sortgroupref_clause(sortgroupref, root->processed_groupClause); + Assert(sgc->tleSortGroupRef <= maxSortGroupRef); + + /* + * If the target expression can be used as a grouping key, it + * should be emitted by the grouped paths that have been pushed + * down to this relation level. + */ + add_column_to_pathtarget(target, expr, sortgroupref); + + /* + * ... and it also should be emitted by the input paths. + */ + add_column_to_pathtarget(agg_input, expr, sortgroupref); + + /* + * Record this SortGroupClause and grouping expression. Note that + * this SortGroupClause might have already been recorded. + */ + if (!list_member(*group_clauses, sgc)) + { + *group_clauses = lappend(*group_clauses, sgc); + *group_exprs = lappend(*group_exprs, expr); + } + } + else if (is_var_needed_by_join(root, (Var *) expr, rel)) + { + /* + * The expression is needed for an upper join but is neither in + * the GROUP BY clause nor derivable from it using EC (otherwise, + * it would have already been included in the targets above). We + * need to create a special SortGroupClause for this expression. + */ + SortGroupClause *sgc = makeNode(SortGroupClause); + + /* Initialize the SortGroupClause. */ + sgc->tleSortGroupRef = ++maxSortGroupRef; + get_sort_group_operators((castNode(Var, expr))->vartype, + false, true, false, + &sgc->sortop, &sgc->eqop, NULL, + &sgc->hashable); + + /* This expression should be emitted by the grouped paths */ + add_column_to_pathtarget(target, expr, sgc->tleSortGroupRef); + + /* ... and it also should be emitted by the input paths. */ + add_column_to_pathtarget(agg_input, expr, sgc->tleSortGroupRef); + + /* Record this SortGroupClause and grouping expression */ + *group_clauses = lappend(*group_clauses, sgc); + *group_exprs = lappend(*group_exprs, expr); + } + else if (is_var_in_aggref_only(root, (Var *) expr)) + { + /* + * The expression is referenced by an aggregate function pushed + * down to this relation and does not appear elsewhere in the + * targetlist or havingQual. Add it to 'agg_input' but not to + * 'target'. + */ + add_new_column_to_pathtarget(agg_input, expr); + } + else + { + /* + * The expression may be functionally dependent on other + * expressions in the target, but we cannot verify this until all + * target expressions have been constructed. + */ + possibly_dependent = lappend(possibly_dependent, expr); + } + } + + /* + * Now we can verify whether an expression is functionally dependent on + * others. + */ + foreach(lc, possibly_dependent) + { + Var *tvar; + List *deps = NIL; + RangeTblEntry *rte; + + tvar = lfirst_node(Var, lc); + rte = root->simple_rte_array[tvar->varno]; + + if (check_functional_grouping(rte->relid, tvar->varno, + tvar->varlevelsup, + target->exprs, &deps)) + { + /* + * The expression is functionally dependent on other target + * expressions, so it can be included in the targets. Since it + * will not be used as a grouping key, a sortgroupref is not + * needed for it. + */ + add_new_column_to_pathtarget(target, (Expr *) tvar); + add_new_column_to_pathtarget(agg_input, (Expr *) tvar); + } + else + { + /* + * We may arrive here with a grouping expression that is proven + * redundant by EquivalenceClass processing, such as 't1.a' in the + * query below. + * + * select max(t1.c) from t t1, t t2 where t1.a = 1 group by t1.a, + * t1.b; + * + * For now we just give up in this case. + */ + return false; + } + } + + return true; +} + +/* + * is_var_in_aggref_only + * Check whether the given Var appears in aggregate expressions and not + * elsewhere in the targetlist or havingQual. + */ +static bool +is_var_in_aggref_only(PlannerInfo *root, Var *var) +{ + ListCell *lc; + + /* + * Search the list of aggregate expressions for the Var. + */ + foreach(lc, root->agg_clause_list) + { + AggClauseInfo *ac_info = lfirst_node(AggClauseInfo, lc); + List *vars; + + Assert(IsA(ac_info->aggref, Aggref)); + + if (!bms_is_member(var->varno, ac_info->agg_eval_at)) + continue; + + vars = pull_var_clause((Node *) ac_info->aggref, + PVC_RECURSE_AGGREGATES | + PVC_RECURSE_WINDOWFUNCS | + PVC_RECURSE_PLACEHOLDERS); + + if (list_member(vars, var)) + { + list_free(vars); + break; + } + + list_free(vars); + } + + return (lc != NULL && !list_member(root->tlist_vars, var)); +} + +/* + * is_var_needed_by_join + * Check if the given Var is needed by joins above the current rel. + * + * Consider pushing the aggregate avg(b.y) down to relation b for the following + * query: + * + * SELECT a.i, avg(b.y) + * FROM a JOIN b ON a.j = b.j + * GROUP BY a.i; + * + * Column b.j needs to be used as the grouping key because otherwise it cannot + * find its way to the input of the join expression. + */ +static bool +is_var_needed_by_join(PlannerInfo *root, Var *var, RelOptInfo *rel) +{ + Relids relids; + int attno; + RelOptInfo *baserel; + + /* + * Note that when checking if the Var is needed by joins above, we want to + * exclude cases where the Var is only needed in the final output. So + * include "relation 0" in the check. + */ + relids = bms_copy(rel->relids); + relids = bms_add_member(relids, 0); + + baserel = find_base_rel(root, var->varno); + attno = var->varattno - baserel->min_attr; + + return bms_nonempty_difference(baserel->attr_needed[attno], relids); +} + +/* + * get_expression_sortgroupref + * Return sortgroupref if the given 'expr' can be used as a grouping key in + * grouped paths for base or join relations, or 0 otherwise. + * + * We first check if 'expr' is among the grouping expressions. If it is not, + * we then check if 'expr' is known equal to any of the grouping expressions + * due to equivalence relationships. + */ +static Index +get_expression_sortgroupref(PlannerInfo *root, Expr *expr) +{ + ListCell *lc; + + foreach(lc, root->group_expr_list) + { + GroupExprInfo *ge_info = lfirst_node(GroupExprInfo, lc); + + Assert(IsA(ge_info->expr, Var)); + + if (equal(ge_info->expr, expr) || + exprs_known_equal(root, (Node *) expr, (Node *) ge_info->expr, + ge_info->btree_opfamily)) + { + Assert(ge_info->sortgroupref > 0); + + return ge_info->sortgroupref; + } + } + + /* The expression cannot be used as a grouping key. */ + return 0; +} diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c index 686309db58..7896c48fe2 100644 --- a/src/backend/utils/misc/guc_tables.c +++ b/src/backend/utils/misc/guc_tables.c @@ -929,6 +929,16 @@ struct config_bool ConfigureNamesBool[] = false, NULL, NULL, NULL }, + { + {"enable_eager_aggregate", PGC_USERSET, QUERY_TUNING_METHOD, + gettext_noop("Enables eager aggregation."), + NULL, + GUC_EXPLAIN + }, + &enable_eager_aggregate, + false, + NULL, NULL, NULL + }, { {"enable_parallel_append", PGC_USERSET, QUERY_TUNING_METHOD, gettext_noop("Enables the planner's use of parallel append plans."), diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 667e0dc40a..2e9df56cf4 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -413,6 +413,7 @@ #enable_sort = on #enable_tidscan = on #enable_group_by_reordering = on +#enable_eager_aggregate = off # - Planner Cost Constants - diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h index 07e2415398..b2a51b121e 100644 --- a/src/include/nodes/pathnodes.h +++ b/src/include/nodes/pathnodes.h @@ -80,6 +80,25 @@ typedef enum UpperRelationKind /* NB: UPPERREL_FINAL must be last enum entry; it's used to size arrays */ } UpperRelationKind; +/* + * A structure consisting of a list and a hash table to store relations. + * + * For small problems we just scan the list to do lookups, but when there are + * many relations we build a hash table for faster lookups. The hash table is + * present and valid when 'hash' is not NULL. Note that we still maintain the + * list even when using the hash table for lookups; this simplifies life for + * GEQO. + */ +typedef struct RelInfoList +{ + pg_node_attr(no_copy_equal, no_read) + + NodeTag type; + + List *items; + struct HTAB *hash pg_node_attr(read_write_ignore); +} RelInfoList; + /*---------- * PlannerGlobal * Global information for planning/optimization @@ -270,15 +289,16 @@ struct PlannerInfo /* * join_rel_list is a list of all join-relation RelOptInfos we have - * considered in this planning run. For small problems we just scan the - * list to do lookups, but when there are many join relations we build a - * hash table for faster lookups. The hash table is present and valid - * when join_rel_hash is not NULL. Note that we still maintain the list - * even when using the hash table for lookups; this simplifies life for - * GEQO. + * considered in this planning run. */ - List *join_rel_list; - struct HTAB *join_rel_hash pg_node_attr(read_write_ignore); + RelInfoList *join_rel_list; /* list of join-relation RelOptInfos */ + + /* + * grouped_rel_list is a list of all grouped-relation RelOptInfos we have + * considered in this planning run. This is only used by eager + * aggregation. + */ + RelInfoList *grouped_rel_list; /* list of grouped-relation RelOptInfos */ /* * When doing a dynamic-programming-style join search, join_rel_level[k] @@ -373,6 +393,15 @@ struct PlannerInfo /* list of PlaceHolderInfos */ List *placeholder_list; + /* list of AggClauseInfos */ + List *agg_clause_list; + + /* list of GroupExprInfos */ + List *group_expr_list; + + /* list of plain Vars contained in targetlist and havingQual */ + List *tlist_vars; + /* array of PlaceHolderInfos indexed by phid */ struct PlaceHolderInfo **placeholder_array pg_node_attr(read_write_ignore, array_size(placeholder_array_size)); /* allocated size of array */ @@ -998,6 +1027,12 @@ typedef struct RelOptInfo /* consider partitionwise join paths? (if partitioned rel) */ bool consider_partitionwise_join; + /* + * used by eager aggregation: + */ + /* information needed to create grouped paths */ + struct RelAggInfo *agg_info; + /* * inheritance links, if this is an otherrel (otherwise NULL): */ @@ -1071,6 +1106,62 @@ typedef struct RelOptInfo ((rel)->part_scheme && (rel)->boundinfo && (rel)->nparts > 0 && \ (rel)->part_rels && (rel)->partexprs && (rel)->nullable_partexprs) +/* + * Is the given relation a grouped relation? + */ +#define IS_GROUPED_REL(rel) \ + ((rel)->agg_info != NULL) + +/* + * RelAggInfo + * Information needed to create grouped paths for base and join rels. + * + * "relids" is the set of relation identifiers (RT indexes). + * + * "target" is the output tlist for the grouped paths. + * + * "agg_input" is the output tlist for the paths that provide input to the + * grouped paths. One difference from the reltarget of the non-grouped + * relation is that agg_input has its sortgrouprefs[] initialized. + * + * "grouped_rows" is the estimated number of result tuples of the grouped + * relation. + * + * "group_clauses", "group_exprs" and "group_pathkeys" are lists of + * SortGroupClauses, the corresponding grouping expressions and PathKeys + * respectively. + */ +typedef struct RelAggInfo +{ + pg_node_attr(no_copy_equal, no_read, no_query_jumble) + + NodeTag type; + + /* set of base + OJ relids (rangetable indexes) */ + Relids relids; + + /* + * default result targetlist for Paths scanning this grouped relation; + * list of Vars/Exprs, cost, width + */ + struct PathTarget *target; + + /* + * the targetlist for Paths that provide input to the grouped paths + */ + struct PathTarget *agg_input; + + /* estimated number of result tuples */ + Cardinality grouped_rows; + + /* a list of SortGroupClauses */ + List *group_clauses; + /* a list of grouping expressions */ + List *group_exprs; + /* a list of PathKeys */ + List *group_pathkeys; +} RelAggInfo; + /* * IndexOptInfo * Per-index information for planning/optimization @@ -3140,6 +3231,41 @@ typedef struct MinMaxAggInfo Param *param; } MinMaxAggInfo; +/* + * The aggregate expressions that appear in targetlist and having clauses + */ +typedef struct AggClauseInfo +{ + pg_node_attr(no_read, no_query_jumble) + + NodeTag type; + + /* the Aggref expr */ + Aggref *aggref; + + /* lowest level we can evaluate this aggregate at */ + Relids agg_eval_at; +} AggClauseInfo; + +/* + * The grouping expressions that appear in grouping clauses + */ +typedef struct GroupExprInfo +{ + pg_node_attr(no_read, no_query_jumble) + + NodeTag type; + + /* the represented expression */ + Expr *expr; + + /* the tleSortGroupRef of the corresponding SortGroupClause */ + Index sortgroupref; + + /* btree opfamily defining the ordering */ + Oid btree_opfamily; +} GroupExprInfo; + /* * At runtime, PARAM_EXEC slots are used to pass values around from one plan * node to another. They can be used to pass values down into subqueries (for diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index 1035e6560c..d3c05a61ba 100644 --- a/src/include/optimizer/pathnode.h +++ b/src/include/optimizer/pathnode.h @@ -314,10 +314,16 @@ extern void setup_simple_rel_arrays(PlannerInfo *root); extern void expand_planner_arrays(PlannerInfo *root, int add_size); extern RelOptInfo *build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent); +extern RelOptInfo *build_simple_grouped_rel(PlannerInfo *root, + RelOptInfo *rel_plain); +extern RelOptInfo *build_grouped_rel(PlannerInfo *root, + RelOptInfo *rel_plain); extern RelOptInfo *find_base_rel(PlannerInfo *root, int relid); extern RelOptInfo *find_base_rel_noerr(PlannerInfo *root, int relid); extern RelOptInfo *find_base_rel_ignore_join(PlannerInfo *root, int relid); extern RelOptInfo *find_join_rel(PlannerInfo *root, Relids relids); +extern void add_grouped_rel(PlannerInfo *root, RelOptInfo *rel); +extern RelOptInfo *find_grouped_rel(PlannerInfo *root, Relids relids); extern RelOptInfo *build_join_rel(PlannerInfo *root, Relids joinrelids, RelOptInfo *outer_rel, @@ -353,4 +359,5 @@ extern RelOptInfo *build_child_join_rel(PlannerInfo *root, SpecialJoinInfo *sjinfo, int nappinfos, AppendRelInfo **appinfos); +extern RelAggInfo *create_rel_agg_info(PlannerInfo *root, RelOptInfo *rel); #endif /* PATHNODE_H */ diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h index a78e90610f..1e7d010ecb 100644 --- a/src/include/optimizer/paths.h +++ b/src/include/optimizer/paths.h @@ -21,6 +21,7 @@ * allpaths.c */ extern PGDLLIMPORT bool enable_geqo; +extern PGDLLIMPORT bool enable_eager_aggregate; extern PGDLLIMPORT int geqo_threshold; extern PGDLLIMPORT int min_parallel_table_scan_size; extern PGDLLIMPORT int min_parallel_index_scan_size; @@ -57,6 +58,10 @@ extern void generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows); extern void generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows); +extern void generate_grouped_paths(PlannerInfo *root, + RelOptInfo *rel_grouped, + RelOptInfo *rel_plain, + RelAggInfo *agg_info); extern int compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages, int max_workers); extern void create_partial_bitmap_paths(PlannerInfo *root, RelOptInfo *rel, diff --git a/src/include/optimizer/planmain.h b/src/include/optimizer/planmain.h index aafc173792..cedcd88ebf 100644 --- a/src/include/optimizer/planmain.h +++ b/src/include/optimizer/planmain.h @@ -72,6 +72,7 @@ extern void add_other_rels_to_query(PlannerInfo *root); extern void build_base_rel_tlists(PlannerInfo *root, List *final_tlist); extern void add_vars_to_targetlist(PlannerInfo *root, List *vars, Relids where_needed); +extern void setup_eager_aggregation(PlannerInfo *root); extern void find_lateral_references(PlannerInfo *root); extern void create_lateral_join_info(PlannerInfo *root); extern List *deconstruct_jointree(PlannerInfo *root); diff --git a/src/test/regress/expected/eager_aggregate.out b/src/test/regress/expected/eager_aggregate.out new file mode 100644 index 0000000000..9f63472eff --- /dev/null +++ b/src/test/regress/expected/eager_aggregate.out @@ -0,0 +1,1308 @@ +-- +-- EAGER AGGREGATION +-- Test we can push aggregation down below join +-- +-- Enable eager aggregation, which by default is disabled. +SET enable_eager_aggregate TO on; +CREATE TABLE eager_agg_t1 (a int, b int, c double precision); +CREATE TABLE eager_agg_t2 (a int, b int, c double precision); +CREATE TABLE eager_agg_t3 (a int, b int, c double precision); +INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i; +ANALYZE eager_agg_t1; +ANALYZE eager_agg_t2; +ANALYZE eager_agg_t3; +-- +-- Test eager aggregation over base rel +-- +-- Perform scan of a table, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c +(18 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 +(9 rows) + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial GroupAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Sort + Output: t2.c, t2.b + Sort Key: t2.b + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.c, t2.b +(21 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 +(9 rows) + +RESET enable_hashagg; +-- +-- Test eager aggregation over join rel +-- +-- Perform join of tables, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg((t2.c + t3.c)) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg((t2.c + t3.c))) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg((t2.c + t3.c)) + Group Key: t2.b + -> Hash Join + Output: t2.c, t2.b, t3.c + Hash Cond: (t3.a = t2.a) + -> Seq Scan on public.eager_agg_t3 t3 + Output: t3.a, t3.b, t3.c + -> Hash + Output: t2.c, t2.b, t2.a + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.c, t2.b, t2.a +(25 rows) + +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 497 + 2 | 499 + 3 | 501 + 4 | 503 + 5 | 505 + 6 | 507 + 7 | 509 + 8 | 511 + 9 | 513 +(9 rows) + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg((t2.c + t3.c)) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Sort Key: t1.a + -> Hash Join + Output: t1.a, (PARTIAL avg((t2.c + t3.c))) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg((t2.c + t3.c))) + -> Partial GroupAggregate + Output: t2.b, PARTIAL avg((t2.c + t3.c)) + Group Key: t2.b + -> Sort + Output: t2.c, t2.b, t3.c + Sort Key: t2.b + -> Hash Join + Output: t2.c, t2.b, t3.c + Hash Cond: (t3.a = t2.a) + -> Seq Scan on public.eager_agg_t3 t3 + Output: t3.a, t3.b, t3.c + -> Hash + Output: t2.c, t2.b, t2.a + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.c, t2.b, t2.a +(28 rows) + +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 497 + 2 | 499 + 3 | 501 + 4 | 503 + 5 | 505 + 6 | 507 + 7 | 509 + 8 | 511 + 9 | 513 +(9 rows) + +RESET enable_hashagg; +-- +-- Test that eager aggregation works for outer join +-- +-- Ensure aggregation can be pushed down to the non-nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +------------------------------------------------------------------ + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Hash Right Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c +(18 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 + | 505 +(10 rows) + +-- Ensure aggregation cannot be pushed down to the nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; + QUERY PLAN +------------------------------------------------------------ + Sort + Output: t2.b, (avg(t2.c)) + Sort Key: t2.b + -> HashAggregate + Output: t2.b, avg(t2.c) + Group Key: t2.b + -> Hash Right Join + Output: t2.b, t2.c + Hash Cond: (t2.b = t1.b) + -> Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c + -> Hash + Output: t1.b + -> Seq Scan on public.eager_agg_t1 t1 + Output: t1.b +(15 rows) + +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; + b | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 + | +(10 rows) + +-- +-- Test that eager aggregation works for parallel plans +-- +SET parallel_setup_cost=0; +SET parallel_tuple_cost=0; +SET min_parallel_table_scan_size=0; +SET max_parallel_workers_per_gather=4; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + QUERY PLAN +--------------------------------------------------------------------------------- + Finalize GroupAggregate + Output: t1.a, avg(t2.c) + Group Key: t1.a + -> Gather Merge + Output: t1.a, (PARTIAL avg(t2.c)) + Workers Planned: 2 + -> Sort + Output: t1.a, (PARTIAL avg(t2.c)) + Sort Key: t1.a + -> Parallel Hash Join + Output: t1.a, (PARTIAL avg(t2.c)) + Hash Cond: (t1.b = t2.b) + -> Parallel Seq Scan on public.eager_agg_t1 t1 + Output: t1.a, t1.b, t1.c + -> Parallel Hash + Output: t2.b, (PARTIAL avg(t2.c)) + -> Partial HashAggregate + Output: t2.b, PARTIAL avg(t2.c) + Group Key: t2.b + -> Parallel Seq Scan on public.eager_agg_t2 t2 + Output: t2.a, t2.b, t2.c +(21 rows) + +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + a | avg +---+----- + 1 | 496 + 2 | 497 + 3 | 498 + 4 | 499 + 5 | 500 + 6 | 501 + 7 | 502 + 8 | 503 + 9 | 504 +(9 rows) + +RESET parallel_setup_cost; +RESET parallel_tuple_cost; +RESET min_parallel_table_scan_size; +RESET max_parallel_workers_per_gather; +DROP TABLE eager_agg_t1; +DROP TABLE eager_agg_t2; +DROP TABLE eager_agg_t3; +-- +-- Test eager aggregation for partitionwise join +-- +-- Enable partitionwise aggregate, which by default is disabled. +SET enable_partitionwise_aggregate TO true; +-- Enable partitionwise join, which by default is disabled. +SET enable_partitionwise_join TO true; +CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30); +CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y); +CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30); +INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i; +INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i; +ANALYZE eager_agg_tab1; +ANALYZE eager_agg_tab2; +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum(t1.y)), (count(*)) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum(t1.y), count(*) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + Hash Cond: (t2.y = t1.x) + -> Seq Scan on public.eager_agg_tab2_p1 t2 + Output: t2.y + -> Hash + Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*) + Group Key: t1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1 + Output: t1.x, t1.y + -> Finalize HashAggregate + Output: t1_1.x, sum(t1_1.y), count(*) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + Hash Cond: (t2_1.y = t1_1.x) + -> Seq Scan on public.eager_agg_tab2_p2 t2_1 + Output: t2_1.y + -> Hash + Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*) + Group Key: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_1 + Output: t1_1.x, t1_1.y + -> Finalize HashAggregate + Output: t1_2.x, sum(t1_2.y), count(*) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + Hash Cond: (t2_2.y = t1_2.x) + -> Seq Scan on public.eager_agg_tab2_p3 t2_2 + Output: t2_2.y + -> Hash + Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*) + Group Key: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_2 + Output: t1_2.x, t1_2.y +(49 rows) + +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; + x | sum | count +----+------+------- + 0 | 500 | 100 + 6 | 1100 | 100 + 12 | 700 | 100 + 18 | 1300 | 100 + 24 | 900 | 100 +(5 rows) + +-- GROUP BY having other matching key +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t2.y, (sum(t1.y)), (count(*)) + Sort Key: t2.y + -> Append + -> Finalize HashAggregate + Output: t2.y, sum(t1.y), count(*) + Group Key: t2.y + -> Hash Join + Output: t2.y, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + Hash Cond: (t2.y = t1.x) + -> Seq Scan on public.eager_agg_tab2_p1 t2 + Output: t2.y + -> Hash + Output: t1.x, (PARTIAL sum(t1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1.x, PARTIAL sum(t1.y), PARTIAL count(*) + Group Key: t1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1 + Output: t1.y, t1.x + -> Finalize HashAggregate + Output: t2_1.y, sum(t1_1.y), count(*) + Group Key: t2_1.y + -> Hash Join + Output: t2_1.y, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + Hash Cond: (t2_1.y = t1_1.x) + -> Seq Scan on public.eager_agg_tab2_p2 t2_1 + Output: t2_1.y + -> Hash + Output: t1_1.x, (PARTIAL sum(t1_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_1.x, PARTIAL sum(t1_1.y), PARTIAL count(*) + Group Key: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_1 + Output: t1_1.y, t1_1.x + -> Finalize HashAggregate + Output: t2_2.y, sum(t1_2.y), count(*) + Group Key: t2_2.y + -> Hash Join + Output: t2_2.y, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + Hash Cond: (t2_2.y = t1_2.x) + -> Seq Scan on public.eager_agg_tab2_p3 t2_2 + Output: t2_2.y + -> Hash + Output: t1_2.x, (PARTIAL sum(t1_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t1_2.x, PARTIAL sum(t1_2.y), PARTIAL count(*) + Group Key: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_2 + Output: t1_2.y, t1_2.x +(49 rows) + +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; + y | sum | count +----+------+------- + 0 | 500 | 100 + 6 | 1100 | 100 + 12 | 700 | 100 + 18 | 1300 | 100 + 24 | 900 | 100 +(5 rows) + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; + QUERY PLAN +------------------------------------------------------------------------------------------------------------ + Sort + Output: t2.x, (sum(t1.x)), (count(*)) + Sort Key: t2.x + -> Finalize HashAggregate + Output: t2.x, sum(t1.x), count(*) + Group Key: t2.x + Filter: (avg(t1.x) > '10'::numeric) + -> Append + -> Hash Join + Output: t2_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x)) + Hash Cond: (t2_1.y = t1_1.x) + -> Seq Scan on public.eager_agg_tab2_p1 t2_1 + Output: t2_1.x, t2_1.y + -> Hash + Output: t1_1.x, (PARTIAL sum(t1_1.x)), (PARTIAL count(*)), (PARTIAL avg(t1_1.x)) + -> Partial HashAggregate + Output: t1_1.x, PARTIAL sum(t1_1.x), PARTIAL count(*), PARTIAL avg(t1_1.x) + Group Key: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1_1 + Output: t1_1.x + -> Hash Join + Output: t2_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x)) + Hash Cond: (t2_2.y = t1_2.x) + -> Seq Scan on public.eager_agg_tab2_p2 t2_2 + Output: t2_2.x, t2_2.y + -> Hash + Output: t1_2.x, (PARTIAL sum(t1_2.x)), (PARTIAL count(*)), (PARTIAL avg(t1_2.x)) + -> Partial HashAggregate + Output: t1_2.x, PARTIAL sum(t1_2.x), PARTIAL count(*), PARTIAL avg(t1_2.x) + Group Key: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_2 + Output: t1_2.x + -> Hash Join + Output: t2_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x)) + Hash Cond: (t2_3.y = t1_3.x) + -> Seq Scan on public.eager_agg_tab2_p3 t2_3 + Output: t2_3.x, t2_3.y + -> Hash + Output: t1_3.x, (PARTIAL sum(t1_3.x)), (PARTIAL count(*)), (PARTIAL avg(t1_3.x)) + -> Partial HashAggregate + Output: t1_3.x, PARTIAL sum(t1_3.x), PARTIAL count(*), PARTIAL avg(t1_3.x) + Group Key: t1_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_3 + Output: t1_3.x +(44 rows) + +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; + x | sum | count +----+------+------- + 2 | 600 | 50 + 4 | 1200 | 50 + 8 | 900 | 50 + 12 | 600 | 50 + 14 | 1200 | 50 + 18 | 900 | 50 +(6 rows) + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +------------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum((t2.y + t3.y))) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum((t2.y + t3.y)) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum((t2.y + t3.y))) + Hash Cond: (t1.x = t2.x) + -> Seq Scan on public.eager_agg_tab1_p1 t1 + Output: t1.x + -> Hash + Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))) + -> Partial HashAggregate + Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)) + Group Key: t2.x + -> Hash Join + Output: t2.y, t2.x, t3.y, t3.x + Hash Cond: (t2.x = t3.x) + -> Seq Scan on public.eager_agg_tab1_p1 t2 + Output: t2.y, t2.x + -> Hash + Output: t3.y, t3.x + -> Seq Scan on public.eager_agg_tab1_p1 t3 + Output: t3.y, t3.x + -> Finalize HashAggregate + Output: t1_1.x, sum((t2_1.y + t3_1.y)) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab1_p2 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))) + -> Partial HashAggregate + Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)) + Group Key: t2_1.x + -> Hash Join + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab1_p2 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab1_p2 t3_1 + Output: t3_1.y, t3_1.x + -> Finalize HashAggregate + Output: t1_2.x, sum((t2_2.y + t3_2.y)) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab1_p3 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))) + -> Partial HashAggregate + Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)) + Group Key: t2_2.x + -> Hash Join + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab1_p3 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab1_p3 t3_2 + Output: t3_2.y, t3_2.x +(70 rows) + +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + x | sum +----+------- + 0 | 10000 + 2 | 14000 + 4 | 18000 + 6 | 22000 + 8 | 26000 + 10 | 10000 + 12 | 14000 + 14 | 18000 + 16 | 22000 + 18 | 26000 + 20 | 10000 + 22 | 14000 + 24 | 18000 + 26 | 22000 + 28 | 26000 +(15 rows) + +-- partial aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + QUERY PLAN +------------------------------------------------------------------------------------------- + Finalize GroupAggregate + Output: t3.y, sum((t2.y + t3.y)) + Group Key: t3.y + -> Sort + Output: t3.y, (PARTIAL sum((t2.y + t3.y))) + Sort Key: t3.y + -> Append + -> Hash Join + Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))) + Hash Cond: (t2_1.x = t1_1.x) + -> Partial GroupAggregate + Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)) + Group Key: t2_1.x, t3_1.y, t3_1.x + -> Incremental Sort + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Sort Key: t2_1.x, t3_1.y + Presorted Key: t2_1.x + -> Merge Join + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Merge Cond: (t2_1.x = t3_1.x) + -> Sort + Output: t2_1.y, t2_1.x + Sort Key: t2_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t2_1 + Output: t2_1.y, t2_1.x + -> Sort + Output: t3_1.y, t3_1.x + Sort Key: t3_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t3_1 + Output: t3_1.y, t3_1.x + -> Hash + Output: t1_1.x + -> Seq Scan on public.eager_agg_tab1_p1 t1_1 + Output: t1_1.x + -> Hash Join + Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))) + Hash Cond: (t2_2.x = t1_2.x) + -> Partial GroupAggregate + Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)) + Group Key: t2_2.x, t3_2.y, t3_2.x + -> Incremental Sort + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Sort Key: t2_2.x, t3_2.y + Presorted Key: t2_2.x + -> Merge Join + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Merge Cond: (t2_2.x = t3_2.x) + -> Sort + Output: t2_2.y, t2_2.x + Sort Key: t2_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t2_2 + Output: t2_2.y, t2_2.x + -> Sort + Output: t3_2.y, t3_2.x + Sort Key: t3_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t3_2 + Output: t3_2.y, t3_2.x + -> Hash + Output: t1_2.x + -> Seq Scan on public.eager_agg_tab1_p2 t1_2 + Output: t1_2.x + -> Hash Join + Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))) + Hash Cond: (t2_3.x = t1_3.x) + -> Partial GroupAggregate + Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)) + Group Key: t2_3.x, t3_3.y, t3_3.x + -> Incremental Sort + Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x + Sort Key: t2_3.x, t3_3.y + Presorted Key: t2_3.x + -> Merge Join + Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x + Merge Cond: (t2_3.x = t3_3.x) + -> Sort + Output: t2_3.y, t2_3.x + Sort Key: t2_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t2_3 + Output: t2_3.y, t2_3.x + -> Sort + Output: t3_3.y, t3_3.x + Sort Key: t3_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t3_3 + Output: t3_3.y, t3_3.x + -> Hash + Output: t1_3.x + -> Seq Scan on public.eager_agg_tab1_p3 t1_3 + Output: t1_3.x +(88 rows) + +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + y | sum +----+------- + 0 | 7500 + 2 | 13500 + 4 | 19500 + 6 | 25500 + 8 | 31500 + 10 | 22500 + 12 | 28500 + 14 | 34500 + 16 | 40500 + 18 | 46500 +(10 rows) + +RESET enable_hashagg; +DROP TABLE eager_agg_tab1; +DROP TABLE eager_agg_tab2; +-- +-- Test with multi-level partitioning scheme +-- +CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15); +CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20); +CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25); +CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30); +INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i; +ANALYZE eager_agg_tab_ml; +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum(t2.y)), (count(*)) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum(t2.y), count(*) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum(t2.y)), (PARTIAL count(*)) + Hash Cond: (t1.x = t2.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1 + Output: t1.x + -> Hash + Output: t2.x, (PARTIAL sum(t2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2.x, PARTIAL sum(t2.y), PARTIAL count(*) + Group Key: t2.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t2 + Output: t2.y, t2.x + -> Finalize HashAggregate + Output: t1_1.x, sum(t2_1.y), count(*) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*) + Group Key: t2_1.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1 + Output: t2_1.y, t2_1.x + -> Finalize HashAggregate + Output: t1_2.x, sum(t2_2.y), count(*) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*) + Group Key: t2_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2 + Output: t2_2.y, t2_2.x + -> Finalize HashAggregate + Output: t1_3.x, sum(t2_3.y), count(*) + Group Key: t1_3.x + -> Hash Join + Output: t1_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3 + Output: t1_3.x + -> Hash + Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*) + Group Key: t2_3.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3 + Output: t2_3.y, t2_3.x + -> Finalize HashAggregate + Output: t1_4.x, sum(t2_4.y), count(*) + Group Key: t1_4.x + -> Hash Join + Output: t1_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4 + Output: t1_4.x + -> Hash + Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*) + Group Key: t2_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4 + Output: t2_4.y, t2_4.x +(79 rows) + +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; + x | sum | count +----+-------+------- + 0 | 0 | 1089 + 1 | 1156 | 1156 + 2 | 2312 | 1156 + 3 | 3468 | 1156 + 4 | 4624 | 1156 + 5 | 5780 | 1156 + 6 | 6936 | 1156 + 7 | 8092 | 1156 + 8 | 9248 | 1156 + 9 | 10404 | 1156 + 10 | 11560 | 1156 + 11 | 11979 | 1089 + 12 | 13068 | 1089 + 13 | 14157 | 1089 + 14 | 15246 | 1089 + 15 | 16335 | 1089 + 16 | 17424 | 1089 + 17 | 18513 | 1089 + 18 | 19602 | 1089 + 19 | 20691 | 1089 + 20 | 21780 | 1089 + 21 | 22869 | 1089 + 22 | 23958 | 1089 + 23 | 25047 | 1089 + 24 | 26136 | 1089 + 25 | 27225 | 1089 + 26 | 28314 | 1089 + 27 | 29403 | 1089 + 28 | 30492 | 1089 + 29 | 31581 | 1089 +(30 rows) + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; + QUERY PLAN +--------------------------------------------------------------------------------------- + Sort + Output: t1.y, (sum(t2.y)), (count(*)) + Sort Key: t1.y + -> Finalize HashAggregate + Output: t1.y, sum(t2.y), count(*) + Group Key: t1.y + -> Append + -> Hash Join + Output: t1_1.y, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1 + Output: t1_1.y, t1_1.x + -> Hash + Output: t2_1.x, (PARTIAL sum(t2_1.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, PARTIAL sum(t2_1.y), PARTIAL count(*) + Group Key: t2_1.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash Join + Output: t1_2.y, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2 + Output: t1_2.y, t1_2.x + -> Hash + Output: t2_2.x, (PARTIAL sum(t2_2.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, PARTIAL sum(t2_2.y), PARTIAL count(*) + Group Key: t2_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2 + Output: t2_2.y, t2_2.x + -> Hash Join + Output: t1_3.y, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3 + Output: t1_3.y, t1_3.x + -> Hash + Output: t2_3.x, (PARTIAL sum(t2_3.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, PARTIAL sum(t2_3.y), PARTIAL count(*) + Group Key: t2_3.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3 + Output: t2_3.y, t2_3.x + -> Hash Join + Output: t1_4.y, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4 + Output: t1_4.y, t1_4.x + -> Hash + Output: t2_4.x, (PARTIAL sum(t2_4.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, PARTIAL sum(t2_4.y), PARTIAL count(*) + Group Key: t2_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4 + Output: t2_4.y, t2_4.x + -> Hash Join + Output: t1_5.y, (PARTIAL sum(t2_5.y)), (PARTIAL count(*)) + Hash Cond: (t1_5.x = t2_5.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5 + Output: t1_5.y, t1_5.x + -> Hash + Output: t2_5.x, (PARTIAL sum(t2_5.y)), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_5.x, PARTIAL sum(t2_5.y), PARTIAL count(*) + Group Key: t2_5.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5 + Output: t2_5.y, t2_5.x +(67 rows) + +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; + y | sum | count +----+-------+------- + 0 | 0 | 1089 + 1 | 1156 | 1156 + 2 | 2312 | 1156 + 3 | 3468 | 1156 + 4 | 4624 | 1156 + 5 | 5780 | 1156 + 6 | 6936 | 1156 + 7 | 8092 | 1156 + 8 | 9248 | 1156 + 9 | 10404 | 1156 + 10 | 11560 | 1156 + 11 | 11979 | 1089 + 12 | 13068 | 1089 + 13 | 14157 | 1089 + 14 | 15246 | 1089 + 15 | 16335 | 1089 + 16 | 17424 | 1089 + 17 | 18513 | 1089 + 18 | 19602 | 1089 + 19 | 20691 | 1089 + 20 | 21780 | 1089 + 21 | 22869 | 1089 + 22 | 23958 | 1089 + 23 | 25047 | 1089 + 24 | 26136 | 1089 + 25 | 27225 | 1089 + 26 | 28314 | 1089 + 27 | 29403 | 1089 + 28 | 30492 | 1089 + 29 | 31581 | 1089 +(30 rows) + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + QUERY PLAN +---------------------------------------------------------------------------------------------------------- + Sort + Output: t1.x, (sum((t2.y + t3.y))), (count(*)) + Sort Key: t1.x + -> Append + -> Finalize HashAggregate + Output: t1.x, sum((t2.y + t3.y)), count(*) + Group Key: t1.x + -> Hash Join + Output: t1.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*)) + Hash Cond: (t1.x = t2.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1 + Output: t1.x + -> Hash + Output: t2.x, t3.x, (PARTIAL sum((t2.y + t3.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2.x, t3.x, PARTIAL sum((t2.y + t3.y)), PARTIAL count(*) + Group Key: t2.x + -> Hash Join + Output: t2.y, t2.x, t3.y, t3.x + Hash Cond: (t2.x = t3.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t2 + Output: t2.y, t2.x + -> Hash + Output: t3.y, t3.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t3 + Output: t3.y, t3.x + -> Finalize HashAggregate + Output: t1_1.x, sum((t2_1.y + t3_1.y)), count(*) + Group Key: t1_1.x + -> Hash Join + Output: t1_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*) + Group Key: t2_1.x + -> Hash Join + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_1 + Output: t3_1.y, t3_1.x + -> Finalize HashAggregate + Output: t1_2.x, sum((t2_2.y + t3_2.y)), count(*) + Group Key: t1_2.x + -> Hash Join + Output: t1_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*) + Group Key: t2_2.x + -> Hash Join + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_2 + Output: t3_2.y, t3_2.x + -> Finalize HashAggregate + Output: t1_3.x, sum((t2_3.y + t3_3.y)), count(*) + Group Key: t1_3.x + -> Hash Join + Output: t1_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_3 + Output: t1_3.x + -> Hash + Output: t2_3.x, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*) + Group Key: t2_3.x + -> Hash Join + Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x + Hash Cond: (t2_3.x = t3_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_3 + Output: t2_3.y, t2_3.x + -> Hash + Output: t3_3.y, t3_3.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_3 + Output: t3_3.y, t3_3.x + -> Finalize HashAggregate + Output: t1_4.x, sum((t2_4.y + t3_4.y)), count(*) + Group Key: t1_4.x + -> Hash Join + Output: t1_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_4 + Output: t1_4.x + -> Hash + Output: t2_4.x, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*) + Group Key: t2_4.x + -> Hash Join + Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x + Hash Cond: (t2_4.x = t3_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_4 + Output: t2_4.y, t2_4.x + -> Hash + Output: t3_4.y, t3_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_4 + Output: t3_4.y, t3_4.x +(114 rows) + +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + x | sum | count +----+---------+------- + 0 | 0 | 35937 + 1 | 78608 | 39304 + 2 | 157216 | 39304 + 3 | 235824 | 39304 + 4 | 314432 | 39304 + 5 | 393040 | 39304 + 6 | 471648 | 39304 + 7 | 550256 | 39304 + 8 | 628864 | 39304 + 9 | 707472 | 39304 + 10 | 786080 | 39304 + 11 | 790614 | 35937 + 12 | 862488 | 35937 + 13 | 934362 | 35937 + 14 | 1006236 | 35937 + 15 | 1078110 | 35937 + 16 | 1149984 | 35937 + 17 | 1221858 | 35937 + 18 | 1293732 | 35937 + 19 | 1365606 | 35937 + 20 | 1437480 | 35937 + 21 | 1509354 | 35937 + 22 | 1581228 | 35937 + 23 | 1653102 | 35937 + 24 | 1724976 | 35937 + 25 | 1796850 | 35937 + 26 | 1868724 | 35937 + 27 | 1940598 | 35937 + 28 | 2012472 | 35937 + 29 | 2084346 | 35937 +(30 rows) + +-- partial aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + QUERY PLAN +------------------------------------------------------------------------------------------------------------------ + Sort + Output: t3.y, (sum((t2.y + t3.y))), (count(*)) + Sort Key: t3.y + -> Finalize HashAggregate + Output: t3.y, sum((t2.y + t3.y)), count(*) + Group Key: t3.y + -> Append + -> Hash Join + Output: t3_1.y, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + Hash Cond: (t1_1.x = t2_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t1_1 + Output: t1_1.x + -> Hash + Output: t2_1.x, t3_1.y, t3_1.x, (PARTIAL sum((t2_1.y + t3_1.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_1.x, t3_1.y, t3_1.x, PARTIAL sum((t2_1.y + t3_1.y)), PARTIAL count(*) + Group Key: t2_1.x, t3_1.y, t3_1.x + -> Hash Join + Output: t2_1.y, t2_1.x, t3_1.y, t3_1.x + Hash Cond: (t2_1.x = t3_1.x) + -> Seq Scan on public.eager_agg_tab_ml_p1 t2_1 + Output: t2_1.y, t2_1.x + -> Hash + Output: t3_1.y, t3_1.x + -> Seq Scan on public.eager_agg_tab_ml_p1 t3_1 + Output: t3_1.y, t3_1.x + -> Hash Join + Output: t3_2.y, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + Hash Cond: (t1_2.x = t2_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t1_2 + Output: t1_2.x + -> Hash + Output: t2_2.x, t3_2.y, t3_2.x, (PARTIAL sum((t2_2.y + t3_2.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_2.x, t3_2.y, t3_2.x, PARTIAL sum((t2_2.y + t3_2.y)), PARTIAL count(*) + Group Key: t2_2.x, t3_2.y, t3_2.x + -> Hash Join + Output: t2_2.y, t2_2.x, t3_2.y, t3_2.x + Hash Cond: (t2_2.x = t3_2.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t2_2 + Output: t2_2.y, t2_2.x + -> Hash + Output: t3_2.y, t3_2.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s1 t3_2 + Output: t3_2.y, t3_2.x + -> Hash Join + Output: t3_3.y, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + Hash Cond: (t1_3.x = t2_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t1_3 + Output: t1_3.x + -> Hash + Output: t2_3.x, t3_3.y, t3_3.x, (PARTIAL sum((t2_3.y + t3_3.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_3.x, t3_3.y, t3_3.x, PARTIAL sum((t2_3.y + t3_3.y)), PARTIAL count(*) + Group Key: t2_3.x, t3_3.y, t3_3.x + -> Hash Join + Output: t2_3.y, t2_3.x, t3_3.y, t3_3.x + Hash Cond: (t2_3.x = t3_3.x) + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t2_3 + Output: t2_3.y, t2_3.x + -> Hash + Output: t3_3.y, t3_3.x + -> Seq Scan on public.eager_agg_tab_ml_p2_s2 t3_3 + Output: t3_3.y, t3_3.x + -> Hash Join + Output: t3_4.y, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + Hash Cond: (t1_4.x = t2_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t1_4 + Output: t1_4.x + -> Hash + Output: t2_4.x, t3_4.y, t3_4.x, (PARTIAL sum((t2_4.y + t3_4.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_4.x, t3_4.y, t3_4.x, PARTIAL sum((t2_4.y + t3_4.y)), PARTIAL count(*) + Group Key: t2_4.x, t3_4.y, t3_4.x + -> Hash Join + Output: t2_4.y, t2_4.x, t3_4.y, t3_4.x + Hash Cond: (t2_4.x = t3_4.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t2_4 + Output: t2_4.y, t2_4.x + -> Hash + Output: t3_4.y, t3_4.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s1 t3_4 + Output: t3_4.y, t3_4.x + -> Hash Join + Output: t3_5.y, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*)) + Hash Cond: (t1_5.x = t2_5.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t1_5 + Output: t1_5.x + -> Hash + Output: t2_5.x, t3_5.y, t3_5.x, (PARTIAL sum((t2_5.y + t3_5.y))), (PARTIAL count(*)) + -> Partial HashAggregate + Output: t2_5.x, t3_5.y, t3_5.x, PARTIAL sum((t2_5.y + t3_5.y)), PARTIAL count(*) + Group Key: t2_5.x, t3_5.y, t3_5.x + -> Hash Join + Output: t2_5.y, t2_5.x, t3_5.y, t3_5.x + Hash Cond: (t2_5.x = t3_5.x) + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t2_5 + Output: t2_5.y, t2_5.x + -> Hash + Output: t3_5.y, t3_5.x + -> Seq Scan on public.eager_agg_tab_ml_p3_s2 t3_5 + Output: t3_5.y, t3_5.x +(102 rows) + +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + y | sum | count +----+---------+------- + 0 | 0 | 35937 + 1 | 78608 | 39304 + 2 | 157216 | 39304 + 3 | 235824 | 39304 + 4 | 314432 | 39304 + 5 | 393040 | 39304 + 6 | 471648 | 39304 + 7 | 550256 | 39304 + 8 | 628864 | 39304 + 9 | 707472 | 39304 + 10 | 786080 | 39304 + 11 | 790614 | 35937 + 12 | 862488 | 35937 + 13 | 934362 | 35937 + 14 | 1006236 | 35937 + 15 | 1078110 | 35937 + 16 | 1149984 | 35937 + 17 | 1221858 | 35937 + 18 | 1293732 | 35937 + 19 | 1365606 | 35937 + 20 | 1437480 | 35937 + 21 | 1509354 | 35937 + 22 | 1581228 | 35937 + 23 | 1653102 | 35937 + 24 | 1724976 | 35937 + 25 | 1796850 | 35937 + 26 | 1868724 | 35937 + 27 | 1940598 | 35937 + 28 | 2012472 | 35937 + 29 | 2084346 | 35937 +(30 rows) + +DROP TABLE eager_agg_tab_ml; diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out index fad7fc3a7e..1dda69e7c2 100644 --- a/src/test/regress/expected/sysviews.out +++ b/src/test/regress/expected/sysviews.out @@ -150,6 +150,7 @@ select name, setting from pg_settings where name like 'enable%'; --------------------------------+--------- enable_async_append | on enable_bitmapscan | on + enable_eager_aggregate | off enable_gathermerge | on enable_group_by_reordering | on enable_hashagg | on @@ -170,7 +171,7 @@ select name, setting from pg_settings where name like 'enable%'; enable_seqscan | on enable_sort | on enable_tidscan | on -(22 rows) +(23 rows) -- There are always wait event descriptions for various types. InjectionPoint -- may be present or absent, depending on history since last postmaster start. diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 4f38104ba0..7bff358315 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -119,7 +119,7 @@ test: plancache limit plpgsql copy2 temp domain rangefuncs prepare conversion tr # The stats test resets stats, so nothing else needing stats access can be in # this group. # ---------- -test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate +test: partition_join partition_prune reloptions hash_part indexing partition_aggregate partition_info tuplesort explain compression memoize stats predicate eager_aggregate # event_trigger depends on create_am and cannot run concurrently with # any test that runs DDL diff --git a/src/test/regress/sql/eager_aggregate.sql b/src/test/regress/sql/eager_aggregate.sql new file mode 100644 index 0000000000..4050e4df44 --- /dev/null +++ b/src/test/regress/sql/eager_aggregate.sql @@ -0,0 +1,192 @@ +-- +-- EAGER AGGREGATION +-- Test we can push aggregation down below join +-- + +-- Enable eager aggregation, which by default is disabled. +SET enable_eager_aggregate TO on; + +CREATE TABLE eager_agg_t1 (a int, b int, c double precision); +CREATE TABLE eager_agg_t2 (a int, b int, c double precision); +CREATE TABLE eager_agg_t3 (a int, b int, c double precision); + +INSERT INTO eager_agg_t1 SELECT i, i, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t2 SELECT i, i%10, i FROM generate_series(1, 1000)i; +INSERT INTO eager_agg_t3 SELECT i%10, i%10, i FROM generate_series(1, 1000)i; + +ANALYZE eager_agg_t1; +ANALYZE eager_agg_t2; +ANALYZE eager_agg_t3; + + +-- +-- Test eager aggregation over base rel +-- + +-- Perform scan of a table, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +RESET enable_hashagg; + + +-- +-- Test eager aggregation over join rel +-- + +-- Perform join of tables, aggregate the result, join it to the other table +-- and finalize the aggregation. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + +-- Produce results with sorting aggregation +SET enable_hashagg TO off; + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c + t3.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b JOIN eager_agg_t3 t3 ON t2.a = t3.a GROUP BY t1.a ORDER BY t1.a; + +RESET enable_hashagg; + + +-- +-- Test that eager aggregation works for outer join +-- + +-- Ensure aggregation can be pushed down to the non-nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 RIGHT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +-- Ensure aggregation cannot be pushed down to the nullable side +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; +SELECT t2.b, avg(t2.c) FROM eager_agg_t1 t1 LEFT JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t2.b ORDER BY t2.b; + + +-- +-- Test that eager aggregation works for parallel plans +-- + +SET parallel_setup_cost=0; +SET parallel_tuple_cost=0; +SET min_parallel_table_scan_size=0; +SET max_parallel_workers_per_gather=4; + +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; +SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON t1.b = t2.b GROUP BY t1.a ORDER BY t1.a; + +RESET parallel_setup_cost; +RESET parallel_tuple_cost; +RESET min_parallel_table_scan_size; +RESET max_parallel_workers_per_gather; + + +DROP TABLE eager_agg_t1; +DROP TABLE eager_agg_t2; +DROP TABLE eager_agg_t3; + + +-- +-- Test eager aggregation for partitionwise join +-- + +-- Enable partitionwise aggregate, which by default is disabled. +SET enable_partitionwise_aggregate TO true; +-- Enable partitionwise join, which by default is disabled. +SET enable_partitionwise_join TO true; + +CREATE TABLE eager_agg_tab1(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab1_p1 PARTITION OF eager_agg_tab1 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab1_p2 PARTITION OF eager_agg_tab1 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab1_p3 PARTITION OF eager_agg_tab1 FOR VALUES FROM (20) TO (30); +CREATE TABLE eager_agg_tab2(x int, y int) PARTITION BY RANGE(y); +CREATE TABLE eager_agg_tab2_p1 PARTITION OF eager_agg_tab2 FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab2_p2 PARTITION OF eager_agg_tab2 FOR VALUES FROM (10) TO (20); +CREATE TABLE eager_agg_tab2_p3 PARTITION OF eager_agg_tab2 FOR VALUES FROM (20) TO (30); +INSERT INTO eager_agg_tab1 SELECT i % 30, i % 20 FROM generate_series(0, 299, 2) i; +INSERT INTO eager_agg_tab2 SELECT i % 20, i % 30 FROM generate_series(0, 299, 3) i; + +ANALYZE eager_agg_tab1; +ANALYZE eager_agg_tab2; + +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t1.x ORDER BY t1.x; + +-- GROUP BY having other matching key +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; +SELECT t2.y, sum(t1.y), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.y ORDER BY t2.y; + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; +SELECT t2.x, sum(t1.x), count(*) FROM eager_agg_tab1 t1, eager_agg_tab2 t2 WHERE t1.x = t2.y GROUP BY t2.x HAVING avg(t1.x) > 10 ORDER BY t2.x; + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + +-- partial aggregation +SET enable_hashagg TO off; +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; +SELECT t3.y, sum(t2.y + t3.y) FROM eager_agg_tab1 t1 JOIN eager_agg_tab1 t2 ON t1.x = t2.x JOIN eager_agg_tab1 t3 ON t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; +RESET enable_hashagg; + +DROP TABLE eager_agg_tab1; +DROP TABLE eager_agg_tab2; + + +-- +-- Test with multi-level partitioning scheme +-- +CREATE TABLE eager_agg_tab_ml(x int, y int) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p1 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (0) TO (10); +CREATE TABLE eager_agg_tab_ml_p2 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (10) TO (20) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p2_s1 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (10) TO (15); +CREATE TABLE eager_agg_tab_ml_p2_s2 PARTITION OF eager_agg_tab_ml_p2 FOR VALUES FROM (15) TO (20); +CREATE TABLE eager_agg_tab_ml_p3 PARTITION OF eager_agg_tab_ml FOR VALUES FROM (20) TO (30) PARTITION BY RANGE(x); +CREATE TABLE eager_agg_tab_ml_p3_s1 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (20) TO (25); +CREATE TABLE eager_agg_tab_ml_p3_s2 PARTITION OF eager_agg_tab_ml_p3 FOR VALUES FROM (25) TO (30); +INSERT INTO eager_agg_tab_ml SELECT i % 30, i % 30 FROM generate_series(1, 1000) i; + +ANALYZE eager_agg_tab_ml; + +-- When GROUP BY clause matches; full aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.x ORDER BY t1.x; + +-- When GROUP BY clause does not match; partial aggregation is performed for each partition. +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; +SELECT t1.y, sum(t2.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x GROUP BY t1.y ORDER BY t1.y; + +-- Check with eager aggregation over join rel +-- full aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; +SELECT t1.x, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t1.x ORDER BY t1.x; + +-- partial aggregation +EXPLAIN (VERBOSE, COSTS OFF) +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; +SELECT t3.y, sum(t2.y + t3.y), count(*) FROM eager_agg_tab_ml t1 JOIN eager_agg_tab_ml t2 ON t1.x = t2.x JOIN eager_agg_tab_ml t3 on t2.x = t3.x GROUP BY t3.y ORDER BY t3.y; + +DROP TABLE eager_agg_tab_ml; diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index b6135f0347..89232ae13d 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -41,6 +41,7 @@ AfterTriggersTableData AfterTriggersTransData Agg AggClauseCosts +AggClauseInfo AggInfo AggPath AggSplit @@ -1062,6 +1063,7 @@ GrantTargetType Group GroupByOrdering GroupClause +GroupExprInfo GroupPath GroupPathExtraData GroupResultPath @@ -1293,7 +1295,6 @@ Join JoinCostWorkspace JoinDomain JoinExpr -JoinHashEntry JoinPath JoinPathExtraData JoinState @@ -2374,12 +2375,16 @@ ReindexObjectType ReindexParams ReindexStmt ReindexType +RelAggInfo RelFileLocator RelFileLocatorBackend RelFileNumber +RelHashEntry RelIdCacheEnt RelInfo RelInfoArr +RelInfoList +RelInfoListInfo RelMapFile RelMapping RelOptInfo -- 2.43.0 ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-10-29 12:59 Robert Haas <[email protected]> parent: Richard Guo <[email protected]> 0 siblings, 1 reply; 30+ messages in thread From: Robert Haas @ 2024-10-29 12:59 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Wed, Sep 25, 2024 at 3:03 AM Richard Guo <[email protected]> wrote: > On Wed, Sep 11, 2024 at 10:52 AM Tender Wang <[email protected]> wrote: > > 1. In make_one_rel(), we have the below codes: > > /* > > * Build grouped base relations for each base rel if possible. > > */ > > setup_base_grouped_rels(root); > > > > As far as I know, each base rel only has one grouped base relation, if possible. > > The comments may be changed to "Build a grouped base relation for each base rel if possible." > > Yeah, each base rel has only one grouped rel. However, there is a > comment nearby stating 'consider_parallel flags for each base rel', > which confuses me about whether it should be singular or plural in > this context. Perhaps someone more proficient in English could > clarify this. It's not confusing the way you have it, but I think an English teacher wouldn't like it, because part of the sentence is singular ("each base rel") and the other part is plural ("grouped base relations"). Tender's proposed rewrite fixes that. Another way to fix it is to write "Build group relations for base rels where possible". > > 2. According to the comments of generate_grouped_paths(), we may generate paths for a grouped > > relation on top of paths of join relation. So the ”rel_plain" argument in generate_grouped_paths() may be > > confused. "plain" usually means "base rel" . How about Re-naming rel_plain to input_rel? > > I don't think 'plain relation' necessarily means 'base relation'. In > this context I think it can mean 'non-grouped relation'. But maybe > I'm wrong. We use the term "plain relation" in several different ways. In the header comments for addFkRecurseReferenced, it means a non-partitioned relation. In the struct comments for RangeTblEntry, it means any sort of named thing in pg_class that you can scan, so either a partitioned or unpartitioned table but not a join or a table function or something. AFAICT, the most common meaning of "plain relation" is a pg_class entry where relkind==RELKIND_RELATION. -- Robert Haas EDB: http://www.enterprisedb.com ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-10-29 20:05 Robert Haas <[email protected]> parent: Richard Guo <[email protected]> 1 sibling, 1 reply; 30+ messages in thread From: Robert Haas @ 2024-10-29 20:05 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Tue, Sep 24, 2024 at 11:20 PM Richard Guo <[email protected]> wrote: > The reason is that it is very tricky to set the size estimates for a > grouped join relation. For a non-grouped join relation, we know that > all its paths have the same rowcount estimate (well, in theory). But > this is not true for a grouped join relation. Suppose we have a > grouped join relation for t1/t2 join. There might be two paths for > it: What exactly do you mean by "well, in theory" here? My understanding of how things work today is that every relation is supposed to produce a specific set of rows and every unparameterized path must produce that set of rows. The order of the rows may vary but the set of rows may not. With your proposed design here, that's no longer true. Instead, what's promised is that the row sets will become equivalent after a later FinalizeAggregate step. In a sense, this is like parameterization or partial paths. Suppose I have: SELECT * FROM foo, bar WHERE foo.x = bar.x; While every unparameterized path for bar has the same row count, there's also the possibility of performing an index scan on bar.x parameterized by foo.x, and that path will have a far lower row count than the unparameterized paths. Instead of producing all the same rows as every other path, the parameterized path promises only that if run repeatedly, with all relevant values of foo.x, you'll eventually get all the same rows you would have gotten from the unparameterized path. Because of this difference, parameterized paths need special handling in many different parts of the code. And the same thing is true of partial paths. They also do not promise to generate all the same rows -- instead, they promise that when run simultaneously across multiple workers, the total set of rows returned across all invocations will be equal to what a normal path would have produced. Here again, there's a need for special handling because these paths behave differently than standard paths. I think what you're doing here is roughly equivalent to either of these two cases. It's more like the parameterized path case. Instead of having a path for a relation which is parameterized by some input parameter, you have a path for a relation, say bar, which is partially aggregated by some grouping column. But there's no guarantee of how much partial aggregation has been done. In your example, PartialAgg(t1 JOIN t2) is "more aggregated" than t1 JOIN PartialAgg(t2), so the row counts are different. This makes me quite nervous. You can't compare a parameterized path to an unparameterized path, but you can compare it to another parameterized path if the parameterizations are the same. You can't compare a partial path to a non-partial path, but you can compare partial paths to each other. But with this work, unparameterized, non-partial paths in the same RelOptInfo don't seem like they are truly comparable. Maybe that's OK, but I'm not sure that it isn't going to break other things. You might for example imagine a design where PartialAgg(t1 JOIN t2) and t1 JOIN PartialAgg(t2) get separate RelOptInfos. After all, there are probably multiple ways to generate paths for each of those things, and paths in each category can be compared to each other apples to apples. What's less clear is whether it's fair to compare across the two categories, and how many assumptions will be broken by doing so. I'm not sure that it's right to have separate RelOptInfos; we definitely don't want to create more RelOptInfos than necessary. At the same time, if we mix together all of those paths into a single RelOptInfo, we need to be confident that we're neither going to break anything nor introduce too many special cases into hot code paths. For instance, set_joinpath_size() represents an unwelcome complexity increase that could impact performance generally, even apart from the cases this patch intends to handle. It's tempting to wonder if there's some way that we can avoid generating paths for both PartialAgg(t1 JOIN t2) and t1 JOIN PartialAgg(t2). Either the former has lower cardinality, or the latter does. It seems likely that the lower-cardinality set is the winning strategy. Even if the path has higher cost to generate, we save work at every subsequent join level and at the final aggregation step. Are there counterexamples where it's better to use a path from the higher-cardinality set? By the way, the work of figuring out what target list should be produced by partial grouping is done by init_grouping_targets(), but the comments seem to take it for granted that I know what result we're trying to produce, and I don't. I think some more high-level explanation of the goals of this code would be useful. It seems to me that if I'm looking at a path for an ungrouped relation and it produces a certain target list, then every column of that target list is needed somewhere. If those columns are group keys, cool: we pass those through. If they're inputs to the aggregates, cool: we feed them to the aggregates. But if they are neither, then what? In the patch, you either group on those columns or add them to the possibly_dependent list depending on the result of is_var_needed_by_join(). I can believe that there are some cases where we can group on such columns and others where we can't, but find it difficult to believe that this test reliably distinguishes between those two cases. If it does, I don't understand why it does. Don't I need to know something about how those columns are used in the upper joins? Like, if those columns are connected by a chain of binary-equality operators back to the user's choice of grouping columns, that sounds good, but this test doesn't distinguish between that case and an upper join on the < operator. create_grouping_expr_infos() does reason based on whether there's an equal-image operator available, but AIUI that's only reasoning about the group columns the user mentioned, not the sort of implicit grouping columns that init_grouping_targets() is creating. I spent a lot of time thinking today about what makes it safe to push down grouping or not. I'm not sure that I have a solid answer to that question even yet. But it seems to me that there are at least two cases that clearly won't fly. One is the case where the partial aggregation would merge rows that need to be included in the final aggregation with rows that should be filtered out. If the partially-grouped relation simply has a filter qual, that's fine, because it will be evaluated before the aggregation. But there might be a qual that has to be evaluated later, either because (a) it involves another rel, like this_rel.x + that_rel.y > 10 or (b) it appears in the ON clause of an outer join and thus needs to be deferred to the level of the OJ, e.g. A LEFT JOIN B ON a.x = b.x AND b.y = 42. I wonder if you can comment on how these cases are handled. Perhaps this coding around functional dependencies has something to do with it, but it isn't clear to me. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-11-01 01:50 Richard Guo <[email protected]> parent: Robert Haas <[email protected]> 0 siblings, 0 replies; 30+ messages in thread From: Richard Guo @ 2024-11-01 01:50 UTC (permalink / raw) To: Robert Haas <[email protected]>; +Cc: Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Tue, Oct 29, 2024 at 9:59 PM Robert Haas <[email protected]> wrote: > On Wed, Sep 25, 2024 at 3:03 AM Richard Guo <[email protected]> wrote: > > On Wed, Sep 11, 2024 at 10:52 AM Tender Wang <[email protected]> wrote: > > > 1. In make_one_rel(), we have the below codes: > > > /* > > > * Build grouped base relations for each base rel if possible. > > > */ > > > setup_base_grouped_rels(root); > > > > > > As far as I know, each base rel only has one grouped base relation, if possible. > > > The comments may be changed to "Build a grouped base relation for each base rel if possible." > > > > Yeah, each base rel has only one grouped rel. However, there is a > > comment nearby stating 'consider_parallel flags for each base rel', > > which confuses me about whether it should be singular or plural in > > this context. Perhaps someone more proficient in English could > > clarify this. > > It's not confusing the way you have it, but I think an English teacher > wouldn't like it, because part of the sentence is singular ("each base > rel") and the other part is plural ("grouped base relations"). > Tender's proposed rewrite fixes that. Another way to fix it is to > write "Build group relations for base rels where possible". Thank you for the suggestion. The new wording looks much better grammatically. It seems to me that we should address the nearby comment too, which goes like "consider_parallel flags for each base rel", as each rel has only one consider_parallel flag. > > > 2. According to the comments of generate_grouped_paths(), we may generate paths for a grouped > > > relation on top of paths of join relation. So the ”rel_plain" argument in generate_grouped_paths() may be > > > confused. "plain" usually means "base rel" . How about Re-naming rel_plain to input_rel? > > > > I don't think 'plain relation' necessarily means 'base relation'. In > > this context I think it can mean 'non-grouped relation'. But maybe > > I'm wrong. > > We use the term "plain relation" in several different ways. In the > header comments for addFkRecurseReferenced, it means a non-partitioned > relation. In the struct comments for RangeTblEntry, it means any sort > of named thing in pg_class that you can scan, so either a partitioned > or unpartitioned table but not a join or a table function or > something. AFAICT, the most common meaning of "plain relation" is a > pg_class entry where relkind==RELKIND_RELATION. Agreed. Thanks Richard ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-11-01 05:54 Richard Guo <[email protected]> parent: Robert Haas <[email protected]> 0 siblings, 0 replies; 30+ messages in thread From: Richard Guo @ 2024-11-01 05:54 UTC (permalink / raw) To: Robert Haas <[email protected]>; +Cc: Tender Wang <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Wed, Oct 30, 2024 at 5:06 AM Robert Haas <[email protected]> wrote: > On Tue, Sep 24, 2024 at 11:20 PM Richard Guo <[email protected]> wrote: > > The reason is that it is very tricky to set the size estimates for a > > grouped join relation. For a non-grouped join relation, we know that > > all its paths have the same rowcount estimate (well, in theory). But > > this is not true for a grouped join relation. Suppose we have a > > grouped join relation for t1/t2 join. There might be two paths for > > it: > > What exactly do you mean by "well, in theory" here? My understanding > of how things work today is that every relation is supposed to produce > a specific set of rows and every unparameterized path must produce > that set of rows. The order of the rows may vary but the set of rows > may not. With your proposed design here, that's no longer true. > Instead, what's promised is that the row sets will become equivalent > after a later FinalizeAggregate step. In a sense, this is like > parameterization or partial paths. Yeah, you're correct that currently each relation is expected to produce the same specific set of rows. When I say "well, in theory" I mean that for a join relation, all its unparameterized paths should theoretically have the same row count estimate. However, in practice, because there are more than one way to make a joinrel for more than two base relations, and the selectivity estimation routines don’t handle all cases equally well, we might get different row count estimates depending on the pair provided. And yes, with the grouped relations proposed in this patch, the situation changes. For a grouped join relation, different paths can produce very different row sets, depending on where the partial aggregation is placed in the path tree. This is also why we need to recalculate the row count estimate for a grouped join path using its outer and inner paths provided, rather than using path->parent->rows directly. This is very like the parameterized path case. > I think what you're doing here is roughly equivalent to either of > these two cases. It's more like the parameterized path case. Instead > of having a path for a relation which is parameterized by some input > parameter, you have a path for a relation, say bar, which is partially > aggregated by some grouping column. But there's no guarantee of how > much partial aggregation has been done. In your example, PartialAgg(t1 > JOIN t2) is "more aggregated" than t1 JOIN PartialAgg(t2), so the row > counts are different. This makes me quite nervous. You can't compare a > parameterized path to an unparameterized path, but you can compare it > to another parameterized path if the parameterizations are the same. > You can't compare a partial path to a non-partial path, but you can > compare partial paths to each other. But with this work, > unparameterized, non-partial paths in the same RelOptInfo don't seem > like they are truly comparable. Maybe that's OK, but I'm not sure that > it isn't going to break other things. Perhaps we could introduce a GroupPathInfo into the Path structure to store common information for a grouped path, such as the location of the partial aggregation (which could be the relids of the relation on top of which we place the partial aggregation) and the estimated rowcount for this grouped path, similar to how ParamPathInfo functions for parameterized paths. Then we should be able to compare the grouped paths of the same location apples to apples. I haven’t thought this through in detail yet, though. > It's tempting to wonder if there's some way that we can avoid > generating paths for both PartialAgg(t1 JOIN t2) and t1 JOIN > PartialAgg(t2). Either the former has lower cardinality, or the latter > does. It seems likely that the lower-cardinality set is the winning > strategy. Even if the path has higher cost to generate, we save work > at every subsequent join level and at the final aggregation step. Are > there counterexamples where it's better to use a path from the > higher-cardinality set? This is very appealing if we can retain only the lower-cardinality path, as it would greatly simplify the overall design. I haven't looked for counterexamples yet, but I plan to do so later. > By the way, the work of figuring out what target list should be > produced by partial grouping is done by init_grouping_targets(), but > the comments seem to take it for granted that I know what result we're > trying to produce, and I don't. I think some more high-level > explanation of the goals of this code would be useful. It seems to me > that if I'm looking at a path for an ungrouped relation and it > produces a certain target list, then every column of that target list > is needed somewhere. If those columns are group keys, cool: we pass > those through. If they're inputs to the aggregates, cool: we feed them > to the aggregates. But if they are neither, then what? In the patch, > you either group on those columns or add them to the > possibly_dependent list depending on the result of > is_var_needed_by_join(). I can believe that there are some cases where > we can group on such columns and others where we can't, but find it > difficult to believe that this test reliably distinguishes between > those two cases. If it does, I don't understand why it does. Don't I > need to know something about how those columns are used in the upper > joins? Like, if those columns are connected by a chain of > binary-equality operators back to the user's choice of grouping > columns, that sounds good, but this test doesn't distinguish between > that case and an upper join on the < operator. > create_grouping_expr_infos() does reason based on whether there's an > equal-image operator available, but AIUI that's only reasoning about > the group columns the user mentioned, not the sort of implicit > grouping columns that init_grouping_targets() is creating. Yeah, this patch does not get it correct here. Basically the logic is that for the partial aggregation pushed down to a non-aggregated relation, we need to consider all columns of that relation involved in upper join clauses and include them in the grouping keys. Currently, the patch only checks whether a column is involved in upper join clauses but does not verify how the column is used. We need to ensure that the operator used in the join clause is at least compatible with the grouping operator; otherwise, the grouping operator might interpret the values as the same while the join operator sees them as different. Thanks Richard ^ permalink raw reply [nested|flat] 30+ messages in thread
* Re: Eager aggregation, take 3 @ 2024-11-06 05:04 jian he <[email protected]> parent: Richard Guo <[email protected]> 1 sibling, 0 replies; 30+ messages in thread From: jian he @ 2024-11-06 05:04 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Robert Haas <[email protected]>; Paul George <[email protected]>; Andy Fan <[email protected]>; pgsql-hackers; [email protected] On Thu, Aug 29, 2024 at 10:26 AM Richard Guo <[email protected]> wrote: > > > > 2. I think there might be techniques we could use to limit planning > > effort at an earlier stage when the approach doesn't appear promising. > > For example, if the proposed grouping column is already unique, the > > exercise is pointless (I think). Ideally we'd like to detect that > > without even creating the grouped_rel. But the proposed grouping > > column might also be *mostly* unique. For example, consider a table > > with a million rows and a column 500,000 distinct values. I suspect it > > will be difficult for partial aggregation to work out to a win in a > > case like this, because I think that the cost of performing the > > partial aggregation will not reduce the cost either of the final > > aggregation or of the intervening join steps by enough to compensate. > > It would be best to find a way to avoid generating a lot of rels and > > paths in cases where there's really not much hope of a win. > > > > One could, perhaps, imagine going further with this by postponing > > eager aggregation planning until after regular paths have been built, > > so that we have good cardinality estimates. Suppose the query joins a > > single fact table to a series of dimension tables. The final plan thus > > uses the fact table as the driving table and joins to the dimension > > tables one by one. Do we really need to consider partial aggregation > > at every level? Perhaps just where there's been a significant row > > count reduction since the last time we tried it, but at the next level > > the row count will increase again? > > > > Maybe there are other heuristics we could use in addition or instead. > > Yeah, one of my concerns with this work is that it can use > significantly more CPU time and memory during planning once enabled. > It would be great if we have some efficient heuristics to limit the > effort. I'll work on that next and see what happens. > in v13, latest version. we can /* ... and initialize these targets */ if (!init_grouping_targets(root, rel, target, agg_input, &group_clauses, &group_exprs)) return NULL; if (rel->reloptkind == RELOPT_BASEREL && group_exprs != NIL) { foreach_node(Var, var, group_exprs) { if(var->varno == rel->relid && has_unique_index(rel, var->varattno)) return NULL; } } since in init_grouping_targets we already Asserted that group_exprs is a list of Var. -------------------------------------------------------------------------------- also in create_rel_agg_info, estimate_num_groups result->group_exprs = group_exprs; result->grouped_rows = estimate_num_groups(root, result->group_exprs, rel->rows, NULL, NULL); /* * The grouped paths for the given relation are considered useful iff * the row reduction ratio is greater than EAGER_AGGREGATE_RATIO. */ agg_info->agg_useful = (agg_info->grouped_rows <= rel->rows * (1 - EAGER_AGGREGATE_RATIO)); If the associated Var in group_exprs is too many, then result->grouped_rows will be less accurate, therefore agg_info->agg_useful will be less accurate. should we limit the number of Var associated with Var group_exprs. for example: SET enable_eager_aggregate TO on; drop table if exists eager_agg_t1,eager_agg_t2, eager_agg_t3; CREATE TABLE eager_agg_t1 (a int, b int, c double precision); CREATE TABLE eager_agg_t2 (a int, b int, c double precision); INSERT INTO eager_agg_t1 SELECT i % 100, i, i FROM generate_series(1, 5)i; INSERT INTO eager_agg_t2 SELECT i % 10, i, i FROM generate_series(1, 5)i; INSERT INTO eager_agg_t2 SELECT i % 10, i, i FROM generate_series(-4, -2)i; explain(costs off, verbose, settings) SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON abs(t1.b) = abs(t2.b % 10 + t2.a) group by 1; explain(costs off, verbose, settings) SELECT t1.a, avg(t2.c) FROM eager_agg_t1 t1 JOIN eager_agg_t2 t2 ON abs(t1.b) = abs(t2.b % 10 + t2.a) group by 1; QUERY PLAN -------------------------------------------------------------------------------------- Finalize HashAggregate Output: t1.a, avg(t2.c) Group Key: t1.a -> Merge Join Output: t1.a, (PARTIAL avg(t2.c)) Merge Cond: ((abs(((t2.b % 10) + t2.a))) = (abs(t1.b))) -> Sort Output: t2.b, t2.a, (PARTIAL avg(t2.c)), (abs(((t2.b % 10) + t2.a))) Sort Key: (abs(((t2.b % 10) + t2.a))) -> Partial HashAggregate Output: t2.b, t2.a, PARTIAL avg(t2.c), abs(((t2.b % 10) + t2.a)) Group Key: t2.b, t2.a -> Seq Scan on public.eager_agg_t2 t2 Output: t2.a, t2.b, t2.c -> Sort Output: t1.a, t1.b, (abs(t1.b)) Sort Key: (abs(t1.b)) -> Seq Scan on public.eager_agg_t1 t1 Output: t1.a, t1.b, abs(t1.b) Settings: enable_eager_aggregate = 'on' Query Identifier: -734044107933323262 ^ permalink raw reply [nested|flat] 30+ messages in thread
end of thread, other threads:[~2024-11-06 05:04 UTC | newest] Thread overview: 30+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-07-03 08:29 Re: Eager aggregation, take 3 Richard Guo <[email protected]> 2024-07-07 02:45 ` Paul George <[email protected]> 2024-07-10 08:27 ` Richard Guo <[email protected]> 2024-07-11 21:50 ` Paul George <[email protected]> 2024-08-16 08:14 ` Richard Guo <[email protected]> 2024-08-21 07:10 ` Richard Guo <[email protected]> 2024-08-23 15:59 ` Robert Haas <[email protected]> 2024-08-29 02:26 ` Richard Guo <[email protected]> 2024-08-29 12:40 ` Robert Haas <[email protected]> 2024-11-06 05:04 ` jian he <[email protected]> 2024-08-28 03:57 ` Tender Wang <[email protected]> 2024-08-28 13:00 ` Robert Haas <[email protected]> 2024-08-29 02:45 ` Richard Guo <[email protected]> 2024-08-29 03:22 ` Tender Wang <[email protected]> 2024-08-29 03:38 ` Tender Wang <[email protected]> 2024-08-29 13:02 ` Robert Haas <[email protected]> 2024-09-25 03:20 ` Richard Guo <[email protected]> 2024-09-27 03:53 ` Richard Guo <[email protected]> 2024-10-29 20:05 ` Robert Haas <[email protected]> 2024-11-01 05:54 ` Richard Guo <[email protected]> 2024-08-29 02:29 ` Richard Guo <[email protected]> 2024-09-04 03:48 ` Tender Wang <[email protected]> 2024-09-13 07:48 ` Tender Wang <[email protected]> 2024-09-25 07:12 ` Richard Guo <[email protected]> 2024-09-05 01:40 ` Tender Wang <[email protected]> 2024-09-25 06:55 ` Richard Guo <[email protected]> 2024-09-11 02:52 ` Tender Wang <[email protected]> 2024-09-25 07:02 ` Richard Guo <[email protected]> 2024-10-29 12:59 ` Robert Haas <[email protected]> 2024-11-01 01:50 ` Richard Guo <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox