public inbox for [email protected]
help / color / mirror / Atom feedFrom: Matheus Alcantara <[email protected]>
To: Alexander Pyhalov <[email protected]>
To: Matheus Alcantara <[email protected]>
Cc: Alena Rybakina <[email protected]>
Cc: Pgsql Hackers <[email protected]>
Subject: Re: Asynchronous MergeAppend
Date: Tue, 30 Dec 2025 12:04:45 -0300
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<CAFY6G8d3Yvxa_kRQA24BsJhwqfmSCv1ujiv_7b6g5isf-ZTs_Q@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
On Tue Dec 30, 2025 at 10:15 AM -03, Alexander Pyhalov wrote:
> Looks good. What do you think about classify_matching_subplans_common()?
> Should it stay where it is or should we hide it to
> src/include/executor/execAppend.h ?
>
Yeah, sounds better to me to move classify_matching_subplans_common to
execAppend.h since it very specific for the MergeAppend and Append
execution. I've declared the function as static inline on execAppend.h
but I'm not sure if it's the best approach.
I've also wrote a proper commit message for this new version.
--
Matheus Alcantara
EDB: https://www.enterprisedb.com
From 14a51e23a506dc7d412dc7130c26b679fed520b9 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov <[email protected]>
Date: Sat, 15 Nov 2025 10:16:25 +0300
Subject: [PATCH v12 1/3] mark_async_capable(): subpath should match subplan
mark_async_capable() believes that path corresponds to plan. This is
not true when create_[merge_]append_plan() inserts sort node. In
this case mark_async_capable() can treat Sort plan node as some
other and crash. Fix this by handling the Sort node separately.
This is needed to make MergeAppend node async-capable that will
be implemented in a next commit.
---
src/backend/optimizer/plan/createplan.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f1a01cfc544..05fac693989 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1139,10 +1139,12 @@ mark_async_capable_plan(Plan *plan, Path *path)
SubqueryScan *scan_plan = (SubqueryScan *) plan;
/*
- * If the generated plan node includes a gating Result node,
- * we can't execute it asynchronously.
+ * Check that plan is really a SubqueryScan before using it.
+ * It can be not true, if the generated plan node includes a
+ * gating Result node or a Sort node. In such case we can't
+ * execute it asynchronously.
*/
- if (IsA(plan, Result))
+ if (!IsA(plan, SubqueryScan))
return false;
/*
@@ -1160,10 +1162,10 @@ mark_async_capable_plan(Plan *plan, Path *path)
FdwRoutine *fdwroutine = path->parent->fdwroutine;
/*
- * If the generated plan node includes a gating Result node,
- * we can't execute it asynchronously.
+ * If the generated plan node includes a gating Result node or
+ * a Sort node, we can't execute it asynchronously.
*/
- if (IsA(plan, Result))
+ if (IsA(plan, Result) || IsA(plan, Sort))
return false;
Assert(fdwroutine != NULL);
@@ -1176,9 +1178,9 @@ mark_async_capable_plan(Plan *plan, Path *path)
/*
* If the generated plan node includes a Result node for the
- * projection, we can't execute it asynchronously.
+ * projection or a Sort node, we can't execute it asynchronously.
*/
- if (IsA(plan, Result))
+ if (IsA(plan, Result) || IsA(plan, Sort))
return false;
/*
--
2.51.2
From c13180f186a924d3ec6987ea8c14eeb61bab10d7 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov <[email protected]>
Date: Sat, 15 Nov 2025 10:23:47 +0300
Subject: [PATCH v12 2/3] MergeAppend should support Async Foreign Scan
subplans
---
.../postgres_fdw/expected/postgres_fdw.out | 288 +++++++++++
contrib/postgres_fdw/postgres_fdw.c | 10 +-
contrib/postgres_fdw/sql/postgres_fdw.sql | 87 ++++
doc/src/sgml/config.sgml | 14 +
src/backend/executor/execAsync.c | 4 +
src/backend/executor/nodeAppend.c | 24 +-
src/backend/executor/nodeMergeAppend.c | 471 +++++++++++++++++-
src/backend/optimizer/path/costsize.c | 1 +
src/backend/optimizer/plan/createplan.c | 9 +
src/backend/utils/misc/guc_parameters.dat | 8 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/executor/nodeMergeAppend.h | 1 +
src/include/nodes/execnodes.h | 59 +++
src/include/optimizer/cost.h | 1 +
src/test/regress/expected/sysviews.out | 3 +-
15 files changed, 951 insertions(+), 30 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6066510c7c0..1b35990b4c0 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11556,6 +11556,46 @@ SELECT * FROM result_tbl ORDER BY a;
(2 rows)
DELETE FROM result_tbl;
+-- Test Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------
+ Merge Append
+ Sort Key: async_pt.b, async_pt.a
+ -> Async Foreign Scan on public.async_p1 async_pt_1
+ Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
+ Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b % 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+ -> Async Foreign Scan on public.async_p2 async_pt_2
+ Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
+ Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b % 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+(8 rows)
+
+SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
+ a | b | c
+------+-----+------
+ 1000 | 0 | 0000
+ 2000 | 0 | 0000
+ 1100 | 100 | 0100
+ 2100 | 100 | 0100
+ 1200 | 200 | 0200
+ 2200 | 200 | 0200
+ 1300 | 300 | 0300
+ 2300 | 300 | 0300
+ 1400 | 400 | 0400
+ 2400 | 400 | 0400
+ 1500 | 500 | 0500
+ 2500 | 500 | 0500
+ 1600 | 600 | 0600
+ 2600 | 600 | 0600
+ 1700 | 700 | 0700
+ 2700 | 700 | 0700
+ 1800 | 800 | 0800
+ 2800 | 800 | 0800
+ 1900 | 900 | 0900
+ 2900 | 900 | 0900
+(20 rows)
+
-- Test error handling, if accessing one of the foreign partitions errors out
CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (10000) TO (10001)
SERVER loopback OPTIONS (table_name 'non_existent_table');
@@ -11604,6 +11644,76 @@ COPY async_pt TO stdout; --error
ERROR: cannot copy from foreign table "async_p1"
DETAIL: Partition "async_p1" is a foreign table in partitioned table "async_pt"
HINT: Try the COPY (SELECT ...) TO variant.
+-- Test Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Merge Append
+ Sort Key: async_pt.b, async_pt.a
+ -> Async Foreign Scan on public.async_p1 async_pt_1
+ Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
+ Filter: (async_pt_1.b === 505)
+ Remote SQL: SELECT a, b, c FROM public.base_tbl1 ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+ -> Async Foreign Scan on public.async_p2 async_pt_2
+ Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
+ Filter: (async_pt_2.b === 505)
+ Remote SQL: SELECT a, b, c FROM public.base_tbl2 ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+ -> Async Foreign Scan on public.async_p3 async_pt_3
+ Output: async_pt_3.a, async_pt_3.b, async_pt_3.c
+ Filter: (async_pt_3.b === 505)
+ Remote SQL: SELECT a, b, c FROM public.base_tbl3 ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+(14 rows)
+
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+ a | b | c
+------+-----+------
+ 1505 | 505 | 0505
+ 2505 | 505 | 0505
+ 3505 | 505 | 0505
+(3 rows)
+
+-- Test async Merge Append rescan
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT
+ ARRAY(SELECT f.i FROM (SELECT b + g.i FROM async_pt WHERE a > g.i ORDER BY b) f(i) ORDER BY f.i LIMIT 10)
+FROM generate_series(1, 3) g(i);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Function Scan on pg_catalog.generate_series g
+ Output: ARRAY(SubPlan array_1)
+ Function Call: generate_series(1, 3)
+ SubPlan array_1
+ -> Limit
+ Output: f.i
+ -> Sort
+ Output: f.i
+ Sort Key: f.i
+ -> Subquery Scan on f
+ Output: f.i
+ -> Merge Append
+ Sort Key: async_pt.b
+ -> Async Foreign Scan on public.async_p1 async_pt_1
+ Output: (async_pt_1.b + g.i), async_pt_1.b
+ Remote SQL: SELECT b FROM public.base_tbl1 WHERE ((a > $1::integer)) ORDER BY b ASC NULLS LAST
+ -> Async Foreign Scan on public.async_p2 async_pt_2
+ Output: (async_pt_2.b + g.i), async_pt_2.b
+ Remote SQL: SELECT b FROM public.base_tbl2 WHERE ((a > $1::integer)) ORDER BY b ASC NULLS LAST
+ -> Async Foreign Scan on public.async_p3 async_pt_3
+ Output: (async_pt_3.b + g.i), async_pt_3.b
+ Remote SQL: SELECT b FROM public.base_tbl3 WHERE ((a > $1::integer)) ORDER BY b ASC NULLS LAST
+(22 rows)
+
+SELECT
+ ARRAY(SELECT f.i FROM (SELECT b + g.i FROM async_pt WHERE a > g.i ORDER BY b) f(i) ORDER BY f.i LIMIT 10)
+FROM generate_series(1, 3) g(i);
+ array
+---------------------------
+ {1,1,1,6,6,6,11,11,11,16}
+ {2,2,2,7,7,7,12,12,12,17}
+ {3,3,3,8,8,8,13,13,13,18}
+(3 rows)
+
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
-- Check case where the partitioned table has local/remote partitions
@@ -11639,6 +11749,37 @@ SELECT * FROM result_tbl ORDER BY a;
(3 rows)
DELETE FROM result_tbl;
+-- Test Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Merge Append
+ Sort Key: async_pt.b, async_pt.a
+ -> Async Foreign Scan on public.async_p1 async_pt_1
+ Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
+ Filter: (async_pt_1.b === 505)
+ Remote SQL: SELECT a, b, c FROM public.base_tbl1 ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+ -> Async Foreign Scan on public.async_p2 async_pt_2
+ Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
+ Filter: (async_pt_2.b === 505)
+ Remote SQL: SELECT a, b, c FROM public.base_tbl2 ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+ -> Sort
+ Output: async_pt_3.a, async_pt_3.b, async_pt_3.c
+ Sort Key: async_pt_3.b, async_pt_3.a
+ -> Seq Scan on public.async_p3 async_pt_3
+ Output: async_pt_3.a, async_pt_3.b, async_pt_3.c
+ Filter: (async_pt_3.b === 505)
+(16 rows)
+
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+ a | b | c
+------+-----+------
+ 1505 | 505 | 0505
+ 2505 | 505 | 0505
+ 3505 | 505 | 0505
+(3 rows)
+
-- partitionwise joins
SET enable_partitionwise_join TO true;
CREATE TABLE join_tbl (a1 int, b1 int, c1 text, a2 int, b2 int, c2 text);
@@ -12421,6 +12562,153 @@ SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM f
DROP FOREIGN TABLE foreign_tbl CASCADE;
NOTICE: drop cascades to foreign table foreign_tbl2
DROP TABLE base_tbl;
+-- Test async Merge Append
+CREATE TABLE distr1 (i int, j int, k text) PARTITION BY HASH (i);
+CREATE TABLE base1 (i int, j int, k text);
+CREATE TABLE base2 (i int, j int, k text);
+CREATE FOREIGN TABLE distr1_p1 PARTITION OF distr1 FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+SERVER loopback OPTIONS (table_name 'base1');
+CREATE FOREIGN TABLE distr1_p2 PARTITION OF distr1 FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+SERVER loopback OPTIONS (table_name 'base2');
+CREATE TABLE distr2 (i int, j int, k text) PARTITION BY HASH (i);
+CREATE TABLE base3 (i int, j int, k text);
+CREATE TABLE base4 (i int, j int, k text);
+CREATE FOREIGN TABLE distr2_p1 PARTITION OF distr2 FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+SERVER loopback OPTIONS (table_name 'base3');
+CREATE FOREIGN TABLE distr2_p2 PARTITION OF distr2 FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+SERVER loopback OPTIONS (table_name 'base4');
+INSERT INTO distr1
+SELECT i, i*10, 'data_' || i FROM generate_series(1, 1000) i;
+INSERT INTO distr2
+SELECT i, i*10, 'data_' || i FROM generate_series(1, 100) i;
+ANALYZE distr1_p1;
+ANALYZE distr1_p2;
+ANALYZE distr2_p1;
+ANALYZE distr2_p2;
+SET enable_partitionwise_join TO ON;
+-- Test joins with async Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM distr1, distr2 WHERE distr1.i=distr2.i AND distr2.j > 90 and distr2.k like 'data%'
+ORDER BY distr2.i LIMIT 10;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+ Output: distr1.i, distr1.j, distr1.k, distr2.i, distr2.j, distr2.k
+ -> Merge Append
+ Sort Key: distr1.i
+ -> Async Foreign Scan
+ Output: distr1_1.i, distr1_1.j, distr1_1.k, distr2_1.i, distr2_1.j, distr2_1.k
+ Relations: (public.distr1_p1 distr1_1) INNER JOIN (public.distr2_p1 distr2_1)
+ Remote SQL: SELECT r3.i, r3.j, r3.k, r5.i, r5.j, r5.k FROM (public.base1 r3 INNER JOIN public.base3 r5 ON (((r3.i = r5.i)) AND ((r5.j > 90)) AND ((r5.k ~~ 'data%')))) ORDER BY r3.i ASC NULLS LAST
+ -> Async Foreign Scan
+ Output: distr1_2.i, distr1_2.j, distr1_2.k, distr2_2.i, distr2_2.j, distr2_2.k
+ Relations: (public.distr1_p2 distr1_2) INNER JOIN (public.distr2_p2 distr2_2)
+ Remote SQL: SELECT r4.i, r4.j, r4.k, r6.i, r6.j, r6.k FROM (public.base2 r4 INNER JOIN public.base4 r6 ON (((r4.i = r6.i)) AND ((r6.j > 90)) AND ((r6.k ~~ 'data%')))) ORDER BY r4.i ASC NULLS LAST
+(12 rows)
+
+SELECT * FROM distr1, distr2 WHERE distr1.i=distr2.i AND distr2.j > 90 and distr2.k like 'data%'
+ORDER BY distr2.i LIMIT 10;
+ i | j | k | i | j | k
+----+-----+---------+----+-----+---------
+ 10 | 100 | data_10 | 10 | 100 | data_10
+ 11 | 110 | data_11 | 11 | 110 | data_11
+ 12 | 120 | data_12 | 12 | 120 | data_12
+ 13 | 130 | data_13 | 13 | 130 | data_13
+ 14 | 140 | data_14 | 14 | 140 | data_14
+ 15 | 150 | data_15 | 15 | 150 | data_15
+ 16 | 160 | data_16 | 16 | 160 | data_16
+ 17 | 170 | data_17 | 17 | 170 | data_17
+ 18 | 180 | data_18 | 18 | 180 | data_18
+ 19 | 190 | data_19 | 19 | 190 | data_19
+(10 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM distr1 LEFT JOIN distr2 ON distr1.i=distr2.i AND distr2.k like 'data%' WHERE distr1.i > 90
+ORDER BY distr1.i LIMIT 20;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+ Output: distr1.i, distr1.j, distr1.k, distr2.i, distr2.j, distr2.k
+ -> Merge Append
+ Sort Key: distr1.i
+ -> Async Foreign Scan
+ Output: distr1_1.i, distr1_1.j, distr1_1.k, distr2_1.i, distr2_1.j, distr2_1.k
+ Relations: (public.distr1_p1 distr1_1) LEFT JOIN (public.distr2_p1 distr2_1)
+ Remote SQL: SELECT r4.i, r4.j, r4.k, r6.i, r6.j, r6.k FROM (public.base1 r4 LEFT JOIN public.base3 r6 ON (((r4.i = r6.i)) AND ((r6.k ~~ 'data%')))) WHERE ((r4.i > 90)) ORDER BY r4.i ASC NULLS LAST
+ -> Async Foreign Scan
+ Output: distr1_2.i, distr1_2.j, distr1_2.k, distr2_2.i, distr2_2.j, distr2_2.k
+ Relations: (public.distr1_p2 distr1_2) LEFT JOIN (public.distr2_p2 distr2_2)
+ Remote SQL: SELECT r5.i, r5.j, r5.k, r7.i, r7.j, r7.k FROM (public.base2 r5 LEFT JOIN public.base4 r7 ON (((r5.i = r7.i)) AND ((r7.k ~~ 'data%')))) WHERE ((r5.i > 90)) ORDER BY r5.i ASC NULLS LAST
+(12 rows)
+
+SELECT * FROM distr1 LEFT JOIN distr2 ON distr1.i=distr2.i AND distr2.k like 'data%' WHERE distr1.i > 90
+ORDER BY distr1.i LIMIT 20;
+ i | j | k | i | j | k
+-----+------+----------+-----+------+----------
+ 91 | 910 | data_91 | 91 | 910 | data_91
+ 92 | 920 | data_92 | 92 | 920 | data_92
+ 93 | 930 | data_93 | 93 | 930 | data_93
+ 94 | 940 | data_94 | 94 | 940 | data_94
+ 95 | 950 | data_95 | 95 | 950 | data_95
+ 96 | 960 | data_96 | 96 | 960 | data_96
+ 97 | 970 | data_97 | 97 | 970 | data_97
+ 98 | 980 | data_98 | 98 | 980 | data_98
+ 99 | 990 | data_99 | 99 | 990 | data_99
+ 100 | 1000 | data_100 | 100 | 1000 | data_100
+ 101 | 1010 | data_101 | | |
+ 102 | 1020 | data_102 | | |
+ 103 | 1030 | data_103 | | |
+ 104 | 1040 | data_104 | | |
+ 105 | 1050 | data_105 | | |
+ 106 | 1060 | data_106 | | |
+ 107 | 1070 | data_107 | | |
+ 108 | 1080 | data_108 | | |
+ 109 | 1090 | data_109 | | |
+ 110 | 1100 | data_110 | | |
+(20 rows)
+
+-- Test pruning with async Merge Append
+DELETE FROM distr2;
+INSERT INTO distr2
+SELECT i%10, i*10, 'data_' || i FROM generate_series(1, 1000) i;
+DEALLOCATE ALL;
+SET plan_cache_mode TO force_generic_plan;
+PREPARE async_pt_query (int, int) AS
+ SELECT * FROM distr2 WHERE i = ANY(ARRAY[$1, $2])
+ ORDER BY i,j
+ LIMIT 10;
+EXPLAIN (VERBOSE, COSTS OFF)
+ EXECUTE async_pt_query(1, 1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+ Output: distr2.i, distr2.j, distr2.k
+ -> Merge Append
+ Sort Key: distr2.i, distr2.j
+ Subplans Removed: 1
+ -> Async Foreign Scan on public.distr2_p1 distr2_1
+ Output: distr2_1.i, distr2_1.j, distr2_1.k
+ Remote SQL: SELECT i, j, k FROM public.base3 WHERE ((i = ANY (ARRAY[$1::integer, $2::integer]))) ORDER BY i ASC NULLS LAST, j ASC NULLS LAST
+(8 rows)
+
+EXECUTE async_pt_query(1, 1);
+ i | j | k
+---+-----+---------
+ 1 | 10 | data_1
+ 1 | 110 | data_11
+ 1 | 210 | data_21
+ 1 | 310 | data_31
+ 1 | 410 | data_41
+ 1 | 510 | data_51
+ 1 | 610 | data_61
+ 1 | 710 | data_71
+ 1 | 810 | data_81
+ 1 | 910 | data_91
+(10 rows)
+
+RESET plan_cache_mode;
+RESET enable_partitionwise_join;
+DROP TABLE distr1, distr2, base1, base2, base3, base4;
ALTER SERVER loopback OPTIONS (DROP async_capable);
ALTER SERVER loopback2 OPTIONS (DROP async_capable);
-- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 5e178c21b39..bd551a1db72 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -7213,12 +7213,16 @@ postgresForeignAsyncConfigureWait(AsyncRequest *areq)
ForeignScanState *node = (ForeignScanState *) areq->requestee;
PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
AsyncRequest *pendingAreq = fsstate->conn_state->pendingAreq;
- AppendState *requestor = (AppendState *) areq->requestor;
- WaitEventSet *set = requestor->as_eventset;
+ PlanState *requestor = areq->requestor;
+ WaitEventSet *set;
+ Bitmapset *needrequest;
/* This should not be called unless callback_pending */
Assert(areq->callback_pending);
+ set = GetAppendEventSet(requestor);
+ needrequest = GetNeedRequest(requestor);
+
/*
* If process_pending_request() has been invoked on the given request
* before we get here, we might have some tuples already; in which case
@@ -7256,7 +7260,7 @@ postgresForeignAsyncConfigureWait(AsyncRequest *areq)
* below, because we might otherwise end up with no configured events
* other than the postmaster death event.
*/
- if (!bms_is_empty(requestor->as_needrequest))
+ if (!bms_is_empty(needrequest))
return;
if (GetNumRegisteredWaitEvents(set) > 1)
return;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 4f7ab2ed0ac..89188dff783 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3929,6 +3929,11 @@ INSERT INTO result_tbl SELECT a, b, 'AAA' || c FROM async_pt WHERE b === 505;
SELECT * FROM result_tbl ORDER BY a;
DELETE FROM result_tbl;
+-- Test Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
+SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
+
-- Test error handling, if accessing one of the foreign partitions errors out
CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (10000) TO (10001)
SERVER loopback OPTIONS (table_name 'non_existent_table');
@@ -3952,6 +3957,20 @@ DELETE FROM result_tbl;
-- Test COPY TO when foreign table is partition
COPY async_pt TO stdout; --error
+-- Test Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+
+-- Test async Merge Append rescan
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT
+ ARRAY(SELECT f.i FROM (SELECT b + g.i FROM async_pt WHERE a > g.i ORDER BY b) f(i) ORDER BY f.i LIMIT 10)
+FROM generate_series(1, 3) g(i);
+SELECT
+ ARRAY(SELECT f.i FROM (SELECT b + g.i FROM async_pt WHERE a > g.i ORDER BY b) f(i) ORDER BY f.i LIMIT 10)
+FROM generate_series(1, 3) g(i);
+
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
@@ -3967,6 +3986,11 @@ INSERT INTO result_tbl SELECT * FROM async_pt WHERE b === 505;
SELECT * FROM result_tbl ORDER BY a;
DELETE FROM result_tbl;
+-- Test Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+
-- partitionwise joins
SET enable_partitionwise_join TO true;
@@ -4205,6 +4229,69 @@ SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM f
DROP FOREIGN TABLE foreign_tbl CASCADE;
DROP TABLE base_tbl;
+-- Test async Merge Append
+CREATE TABLE distr1 (i int, j int, k text) PARTITION BY HASH (i);
+CREATE TABLE base1 (i int, j int, k text);
+CREATE TABLE base2 (i int, j int, k text);
+CREATE FOREIGN TABLE distr1_p1 PARTITION OF distr1 FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+SERVER loopback OPTIONS (table_name 'base1');
+CREATE FOREIGN TABLE distr1_p2 PARTITION OF distr1 FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+SERVER loopback OPTIONS (table_name 'base2');
+
+CREATE TABLE distr2 (i int, j int, k text) PARTITION BY HASH (i);
+CREATE TABLE base3 (i int, j int, k text);
+CREATE TABLE base4 (i int, j int, k text);
+CREATE FOREIGN TABLE distr2_p1 PARTITION OF distr2 FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+SERVER loopback OPTIONS (table_name 'base3');
+CREATE FOREIGN TABLE distr2_p2 PARTITION OF distr2 FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+SERVER loopback OPTIONS (table_name 'base4');
+
+INSERT INTO distr1
+SELECT i, i*10, 'data_' || i FROM generate_series(1, 1000) i;
+
+INSERT INTO distr2
+SELECT i, i*10, 'data_' || i FROM generate_series(1, 100) i;
+
+ANALYZE distr1_p1;
+ANALYZE distr1_p2;
+ANALYZE distr2_p1;
+ANALYZE distr2_p2;
+
+SET enable_partitionwise_join TO ON;
+
+-- Test joins with async Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM distr1, distr2 WHERE distr1.i=distr2.i AND distr2.j > 90 and distr2.k like 'data%'
+ORDER BY distr2.i LIMIT 10;
+SELECT * FROM distr1, distr2 WHERE distr1.i=distr2.i AND distr2.j > 90 and distr2.k like 'data%'
+ORDER BY distr2.i LIMIT 10;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM distr1 LEFT JOIN distr2 ON distr1.i=distr2.i AND distr2.k like 'data%' WHERE distr1.i > 90
+ORDER BY distr1.i LIMIT 20;
+SELECT * FROM distr1 LEFT JOIN distr2 ON distr1.i=distr2.i AND distr2.k like 'data%' WHERE distr1.i > 90
+ORDER BY distr1.i LIMIT 20;
+
+-- Test pruning with async Merge Append
+DELETE FROM distr2;
+INSERT INTO distr2
+SELECT i%10, i*10, 'data_' || i FROM generate_series(1, 1000) i;
+
+DEALLOCATE ALL;
+SET plan_cache_mode TO force_generic_plan;
+PREPARE async_pt_query (int, int) AS
+ SELECT * FROM distr2 WHERE i = ANY(ARRAY[$1, $2])
+ ORDER BY i,j
+ LIMIT 10;
+EXPLAIN (VERBOSE, COSTS OFF)
+ EXECUTE async_pt_query(1, 1);
+EXECUTE async_pt_query(1, 1);
+RESET plan_cache_mode;
+
+RESET enable_partitionwise_join;
+
+DROP TABLE distr1, distr2, base1, base2, base3, base4;
+
ALTER SERVER loopback OPTIONS (DROP async_capable);
ALTER SERVER loopback2 OPTIONS (DROP async_capable);
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index cdfe8e376f0..f7f7e33cee6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5473,6 +5473,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-async-merge-append" xreflabel="enable_async_merge_append">
+ <term><varname>enable_async_merge_append</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_async_merge_append</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's use of async-aware
+ merge append plan types. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-bitmapscan" xreflabel="enable_bitmapscan">
<term><varname>enable_bitmapscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/executor/execAsync.c b/src/backend/executor/execAsync.c
index 5d3cabe73e3..6dc19ebc374 100644
--- a/src/backend/executor/execAsync.c
+++ b/src/backend/executor/execAsync.c
@@ -17,6 +17,7 @@
#include "executor/execAsync.h"
#include "executor/executor.h"
#include "executor/nodeAppend.h"
+#include "executor/nodeMergeAppend.h"
#include "executor/nodeForeignscan.h"
/*
@@ -121,6 +122,9 @@ ExecAsyncResponse(AsyncRequest *areq)
case T_AppendState:
ExecAsyncAppendResponse(areq);
break;
+ case T_MergeAppendState:
+ ExecAsyncMergeAppendResponse(areq);
+ break;
default:
/* If the node doesn't support async, caller messed up. */
elog(ERROR, "unrecognized node type: %d",
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 77c4dd9e4b1..dfbc7b510c4 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -1187,10 +1187,7 @@ ExecAsyncAppendResponse(AsyncRequest *areq)
static void
classify_matching_subplans(AppendState *node)
{
- Bitmapset *valid_asyncplans;
-
Assert(node->as_valid_subplans_identified);
- Assert(node->as_valid_asyncplans == NULL);
/* Nothing to do if there are no valid subplans. */
if (bms_is_empty(node->as_valid_subplans))
@@ -1200,21 +1197,10 @@ classify_matching_subplans(AppendState *node)
return;
}
- /* Nothing to do if there are no valid async subplans. */
- if (!bms_overlap(node->as_valid_subplans, node->as_asyncplans))
- {
+ /* No valid async subplans identified. */
+ if (!classify_matching_subplans_common(
+ &node->as_valid_subplans,
+ node->as_asyncplans,
+ &node->as_valid_asyncplans))
node->as_nasyncremain = 0;
- return;
- }
-
- /* Get valid async subplans. */
- valid_asyncplans = bms_intersect(node->as_asyncplans,
- node->as_valid_subplans);
-
- /* Adjust the valid subplans to contain sync subplans only. */
- node->as_valid_subplans = bms_del_members(node->as_valid_subplans,
- valid_asyncplans);
-
- /* Save valid async subplans. */
- node->as_valid_asyncplans = valid_asyncplans;
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 300bcd5cf33..f1c267eb9eb 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -39,10 +39,15 @@
#include "postgres.h"
#include "executor/executor.h"
+#include "executor/execAsync.h"
#include "executor/execPartition.h"
#include "executor/nodeMergeAppend.h"
#include "lib/binaryheap.h"
#include "miscadmin.h"
+#include "storage/latch.h"
+#include "utils/wait_event.h"
+
+#define EVENT_BUFFER_SIZE 16
/*
* We have one slot for each item in the heap array. We use SlotNumber
@@ -54,6 +59,12 @@ typedef int32 SlotNumber;
static TupleTableSlot *ExecMergeAppend(PlanState *pstate);
static int heap_compare_slots(Datum a, Datum b, void *arg);
+static void classify_matching_subplans(MergeAppendState *node);
+static void ExecMergeAppendAsyncBegin(MergeAppendState *node);
+static void ExecMergeAppendAsyncGetNext(MergeAppendState *node, int mplan);
+static bool ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan);
+static void ExecMergeAppendAsyncEventWait(MergeAppendState *node);
+
/* ----------------------------------------------------------------
* ExecInitMergeAppend
@@ -71,6 +82,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
int nplans;
int i,
j;
+ Bitmapset *asyncplans;
+ int nasyncplans;
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -106,7 +119,10 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* later calls to ExecFindMatchingSubPlans.
*/
if (!prunestate->do_exec_prune && nplans > 0)
+ {
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
+ mergestate->ms_valid_subplans_identified = true;
+ }
}
else
{
@@ -119,6 +135,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Assert(nplans > 0);
mergestate->ms_valid_subplans = validsubplans =
bms_add_range(NULL, 0, nplans - 1);
+ mergestate->ms_valid_subplans_identified = true;
mergestate->ms_prune_state = NULL;
}
@@ -135,11 +152,25 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* the results into the mergeplanstates array.
*/
j = 0;
+ asyncplans = NULL;
+ nasyncplans = 0;
+
i = -1;
while ((i = bms_next_member(validsubplans, i)) >= 0)
{
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
+ /*
+ * Record async subplans. When executing EvalPlanQual, we treat them
+ * as sync ones; don't do this when initializing an EvalPlanQual plan
+ * tree.
+ */
+ if (initNode->async_capable && estate->es_epq_active == NULL)
+ {
+ asyncplans = bms_add_member(asyncplans, j);
+ nasyncplans++;
+ }
+
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
}
@@ -170,6 +201,45 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
mergestate->ps.ps_ProjInfo = NULL;
+ /* Initialize async state */
+ mergestate->ms_asyncplans = asyncplans;
+ mergestate->ms_nasyncplans = nasyncplans;
+ mergestate->ms_asyncrequests = NULL;
+ mergestate->ms_asyncresults = NULL;
+ mergestate->ms_has_asyncresults = NULL;
+ mergestate->ms_asyncremain = NULL;
+ mergestate->ms_needrequest = NULL;
+ mergestate->ms_eventset = NULL;
+ mergestate->ms_valid_asyncplans = NULL;
+
+ if (nasyncplans > 0)
+ {
+ mergestate->ms_asyncrequests = (AsyncRequest **)
+ palloc0(nplans * sizeof(AsyncRequest *));
+
+ i = -1;
+ while ((i = bms_next_member(asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq;
+
+ areq = palloc(sizeof(AsyncRequest));
+ areq->requestor = (PlanState *) mergestate;
+ areq->requestee = mergeplanstates[i];
+ areq->request_index = i;
+ areq->callback_pending = false;
+ areq->request_complete = false;
+ areq->result = NULL;
+
+ mergestate->ms_asyncrequests[i] = areq;
+ }
+
+ mergestate->ms_asyncresults = (TupleTableSlot **)
+ palloc0(nplans * sizeof(TupleTableSlot *));
+
+ if (mergestate->ms_valid_subplans_identified)
+ classify_matching_subplans(mergestate);
+ }
+
/*
* initialize sort-key information
*/
@@ -226,14 +296,18 @@ ExecMergeAppend(PlanState *pstate)
if (node->ms_nplans == 0)
return ExecClearTuple(node->ps.ps_ResultTupleSlot);
- /*
- * If we've yet to determine the valid subplans then do so now. If
- * run-time pruning is disabled then the valid subplans will always be
- * set to all subplans.
- */
- if (node->ms_valid_subplans == NULL)
+ /* If we've yet to determine the valid subplans then do so now. */
+ if (!node->ms_valid_subplans_identified)
+ {
node->ms_valid_subplans =
ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
+ node->ms_valid_subplans_identified = true;
+ classify_matching_subplans(node);
+ }
+
+ /* If there are any async subplans, begin executing them. */
+ if (node->ms_nasyncplans > 0)
+ ExecMergeAppendAsyncBegin(node);
/*
* First time through: pull the first tuple from each valid subplan,
@@ -246,6 +320,16 @@ ExecMergeAppend(PlanState *pstate)
if (!TupIsNull(node->ms_slots[i]))
binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
}
+
+ /* Look at valid async subplans */
+ i = -1;
+ while ((i = bms_next_member(node->ms_valid_asyncplans, i)) >= 0)
+ {
+ ExecMergeAppendAsyncGetNext(node, i);
+ if (!TupIsNull(node->ms_slots[i]))
+ binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
+ }
+
binaryheap_build(node->ms_heap);
node->ms_initialized = true;
}
@@ -260,7 +344,13 @@ ExecMergeAppend(PlanState *pstate)
* to not pull tuples until necessary.)
*/
i = DatumGetInt32(binaryheap_first(node->ms_heap));
- node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
+ if (bms_is_member(i, node->ms_asyncplans))
+ ExecMergeAppendAsyncGetNext(node, i);
+ else
+ {
+ Assert(bms_is_member(i, node->ms_valid_subplans));
+ node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
+ }
if (!TupIsNull(node->ms_slots[i]))
binaryheap_replace_first(node->ms_heap, Int32GetDatum(i));
else
@@ -276,6 +366,8 @@ ExecMergeAppend(PlanState *pstate)
{
i = DatumGetInt32(binaryheap_first(node->ms_heap));
result = node->ms_slots[i];
+ /* For async plan record that we can get the next tuple */
+ node->ms_has_asyncresults = bms_del_member(node->ms_has_asyncresults, i);
}
return result;
@@ -355,6 +447,7 @@ void
ExecReScanMergeAppend(MergeAppendState *node)
{
int i;
+ int nasyncplans = node->ms_nasyncplans;
/*
* If any PARAM_EXEC Params used in pruning expressions have changed, then
@@ -365,8 +458,11 @@ ExecReScanMergeAppend(MergeAppendState *node)
bms_overlap(node->ps.chgParam,
node->ms_prune_state->execparamids))
{
+ node->ms_valid_subplans_identified = false;
bms_free(node->ms_valid_subplans);
node->ms_valid_subplans = NULL;
+ bms_free(node->ms_valid_asyncplans);
+ node->ms_valid_asyncplans = NULL;
}
for (i = 0; i < node->ms_nplans; i++)
@@ -387,6 +483,367 @@ ExecReScanMergeAppend(MergeAppendState *node)
if (subnode->chgParam == NULL)
ExecReScan(subnode);
}
+
+ /* Reset async state */
+ if (nasyncplans > 0)
+ {
+ i = -1;
+ while ((i = bms_next_member(node->ms_asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq = node->ms_asyncrequests[i];
+
+ areq->callback_pending = false;
+ areq->request_complete = false;
+ areq->result = NULL;
+ }
+
+ bms_free(node->ms_asyncremain);
+ node->ms_asyncremain = NULL;
+ bms_free(node->ms_needrequest);
+ node->ms_needrequest = NULL;
+ bms_free(node->ms_has_asyncresults);
+ node->ms_has_asyncresults = NULL;
+ }
binaryheap_reset(node->ms_heap);
node->ms_initialized = false;
}
+
+/* ----------------------------------------------------------------
+ * classify_matching_subplans
+ *
+ * Classify the node's ms_valid_subplans into sync ones and
+ * async ones, adjust it to contain sync ones only, and save
+ * async ones in the node's ms_valid_asyncplans.
+ * ----------------------------------------------------------------
+ */
+static void
+classify_matching_subplans(MergeAppendState *node)
+{
+ Assert(node->ms_valid_subplans_identified);
+
+ /* Nothing to do if there are no valid subplans. */
+ if (bms_is_empty(node->ms_valid_subplans))
+ {
+ node->ms_asyncremain = NULL;
+ return;
+ }
+
+ /* No valid async subplans identified. */
+ if (!classify_matching_subplans_common(
+ &node->ms_valid_subplans,
+ node->ms_asyncplans,
+ &node->ms_valid_asyncplans))
+ node->ms_asyncremain = NULL;
+}
+
+/* ----------------------------------------------------------------
+ * ExecMergeAppendAsyncBegin
+ *
+ * Begin executing designed async-capable subplans.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecMergeAppendAsyncBegin(MergeAppendState *node)
+{
+ int i;
+
+ /* Backward scan is not supported by async-aware MergeAppends. */
+ Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+ /* We should never be called when there are no subplans */
+ Assert(node->ms_nplans > 0);
+
+ /* We should never be called when there are no async subplans. */
+ Assert(node->ms_nasyncplans > 0);
+
+ /* ExecMergeAppend() identifies valid subplans */
+ Assert(node->ms_valid_subplans_identified);
+
+ /* Initialize state variables. */
+ node->ms_asyncremain = bms_copy(node->ms_valid_asyncplans);
+
+ /* Nothing to do if there are no valid async subplans. */
+ if (bms_is_empty(node->ms_asyncremain))
+ return;
+
+ /* Make a request for each of the valid async subplans. */
+ i = -1;
+ while ((i = bms_next_member(node->ms_valid_asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq = node->ms_asyncrequests[i];
+
+ Assert(areq->request_index == i);
+ Assert(!areq->callback_pending);
+
+ /* Do the actual work. */
+ ExecAsyncRequest(areq);
+ }
+}
+
+/* ----------------------------------------------------------------
+ * ExecMergeAppendAsyncGetNext
+ *
+ * Get the next tuple from specified asynchronous subplan.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecMergeAppendAsyncGetNext(MergeAppendState *node, int mplan)
+{
+ node->ms_slots[mplan] = NULL;
+
+ /* Request a tuple asynchronously. */
+ if (ExecMergeAppendAsyncRequest(node, mplan))
+ return;
+
+ /*
+ * node->ms_asyncremain can be NULL if we have fetched tuples, but haven't
+ * returned them yet. In this case ExecMergeAppendAsyncRequest() above
+ * just returns tuples without performing a request.
+ */
+ while (bms_is_member(mplan, node->ms_asyncremain))
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ /* Wait or poll for async events. */
+ ExecMergeAppendAsyncEventWait(node);
+
+ /* Request a tuple asynchronously. */
+ if (ExecMergeAppendAsyncRequest(node, mplan))
+ return;
+
+ /*
+ * Waiting until there's no async requests pending or we got some
+ * tuples from our request
+ */
+ }
+
+ /* No tuples */
+ return;
+}
+
+/* ----------------------------------------------------------------
+ * ExecMergeAppendAsyncRequest
+ *
+ * Request a tuple asynchronously.
+ * ----------------------------------------------------------------
+ */
+static bool
+ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan)
+{
+ Bitmapset *needrequest;
+ int i;
+
+ /*
+ * If we've already fetched necessary data, just return it
+ */
+ if (bms_is_member(mplan, node->ms_has_asyncresults))
+ {
+ node->ms_slots[mplan] = node->ms_asyncresults[mplan];
+ return true;
+ }
+
+ /*
+ * Get a list of members which can process request and don't have data
+ * ready.
+ */
+ needrequest = NULL;
+ i = -1;
+ while ((i = bms_next_member(node->ms_needrequest, i)) >= 0)
+ {
+ if (!bms_is_member(i, node->ms_has_asyncresults))
+ needrequest = bms_add_member(needrequest, i);
+ }
+
+ /*
+ * If there's no members, which still need request, no need to send it.
+ */
+ if (bms_is_empty(needrequest))
+ return false;
+
+ /* Clear ms_needrequest flag, as we are going to send requests now */
+ node->ms_needrequest = bms_del_members(node->ms_needrequest, needrequest);
+
+ /* Make a new request for each of the async subplans that need it. */
+ i = -1;
+ while ((i = bms_next_member(needrequest, i)) >= 0)
+ {
+ AsyncRequest *areq = node->ms_asyncrequests[i];
+
+ /*
+ * We've just checked that subplan doesn't already have some fetched
+ * data
+ */
+ Assert(!bms_is_member(i, node->ms_has_asyncresults));
+
+ /* Do the actual work. */
+ ExecAsyncRequest(areq);
+ }
+ bms_free(needrequest);
+
+ /* Return needed asynchronously-generated results if any. */
+ if (bms_is_member(mplan, node->ms_has_asyncresults))
+ {
+ node->ms_slots[mplan] = node->ms_asyncresults[mplan];
+ return true;
+ }
+
+ return false;
+}
+
+/* ----------------------------------------------------------------
+ * ExecAsyncMergeAppendResponse
+ *
+ * Receive a response from an asynchronous request we made.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAsyncMergeAppendResponse(AsyncRequest *areq)
+{
+ MergeAppendState *node = (MergeAppendState *) areq->requestor;
+ TupleTableSlot *slot = areq->result;
+
+ /* The result should be a TupleTableSlot or NULL. */
+ Assert(slot == NULL || IsA(slot, TupleTableSlot));
+ /* We should handle previous async result prior to getting new one */
+ Assert(!bms_is_member(areq->request_index, node->ms_has_asyncresults));
+
+ node->ms_asyncresults[areq->request_index] = NULL;
+ /* Nothing to do if the request is pending. */
+ if (!areq->request_complete)
+ {
+ /* The request would have been pending for a callback. */
+ Assert(areq->callback_pending);
+ return;
+ }
+
+ /* If the result is NULL or an empty slot, there's nothing more to do. */
+ if (TupIsNull(slot))
+ {
+ /* The ending subplan wouldn't have been pending for a callback. */
+ Assert(!areq->callback_pending);
+ node->ms_asyncremain = bms_del_member(node->ms_asyncremain,
+ areq->request_index);
+ return;
+ }
+
+ /* Mark that the async request has a result */
+ node->ms_has_asyncresults = bms_add_member(node->ms_has_asyncresults,
+ areq->request_index);
+ /* Save result so we can return it. */
+ node->ms_asyncresults[areq->request_index] = slot;
+
+ /*
+ * Mark the subplan that returned a result as ready for a new request. We
+ * don't launch another one here immediately because it might complete.
+ */
+ node->ms_needrequest = bms_add_member(node->ms_needrequest,
+ areq->request_index);
+}
+
+/* ----------------------------------------------------------------
+ * ExecMergeAppendAsyncEventWait
+ *
+ * Wait or poll for file descriptor events and fire callbacks.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecMergeAppendAsyncEventWait(MergeAppendState *node)
+{
+ int nevents = node->ms_nasyncplans + 2; /* one for PM death and
+ * one for latch */
+ WaitEvent occurred_event[EVENT_BUFFER_SIZE];
+ int noccurred;
+ int i;
+
+ /* We should never be called when there are no valid async subplans. */
+ Assert(bms_num_members(node->ms_asyncremain) > 0);
+
+ node->ms_eventset = CreateWaitEventSet(CurrentResourceOwner, nevents);
+ AddWaitEventToSet(node->ms_eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
+ NULL, NULL);
+
+ /* Give each waiting subplan a chance to add an event. */
+ i = -1;
+ while ((i = bms_next_member(node->ms_asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq = node->ms_asyncrequests[i];
+
+ if (areq->callback_pending)
+ ExecAsyncConfigureWait(areq);
+ }
+
+ /*
+ * No need for further processing if none of the subplans configured any
+ * events.
+ */
+ if (GetNumRegisteredWaitEvents(node->ms_eventset) == 1)
+ {
+ FreeWaitEventSet(node->ms_eventset);
+ node->ms_eventset = NULL;
+ return;
+ }
+
+ /*
+ * Add the process latch to the set, so that we wake up to process the
+ * standard interrupts with CHECK_FOR_INTERRUPTS().
+ *
+ * NOTE: For historical reasons, it's important that this is added to the
+ * WaitEventSet after the ExecAsyncConfigureWait() calls. Namely,
+ * postgres_fdw calls "GetNumRegisteredWaitEvents(set) == 1" to check if
+ * any other events are in the set. That's a poor design, it's
+ * questionable for postgres_fdw to be doing that in the first place, but
+ * we cannot change it now. The pattern has possibly been copied to other
+ * extensions too.
+ */
+ AddWaitEventToSet(node->ms_eventset, WL_LATCH_SET, PGINVALID_SOCKET,
+ MyLatch, NULL);
+
+ /* Return at most EVENT_BUFFER_SIZE events in one call. */
+ if (nevents > EVENT_BUFFER_SIZE)
+ nevents = EVENT_BUFFER_SIZE;
+
+ /*
+ * Wait until at least one event occurs.
+ */
+ noccurred = WaitEventSetWait(node->ms_eventset, -1 /* no timeout */ , occurred_event,
+ nevents, WAIT_EVENT_APPEND_READY);
+ FreeWaitEventSet(node->ms_eventset);
+ node->ms_eventset = NULL;
+ if (noccurred == 0)
+ return;
+
+ /* Deliver notifications. */
+ for (i = 0; i < noccurred; i++)
+ {
+ WaitEvent *w = &occurred_event[i];
+
+ /*
+ * Each waiting subplan should have registered its wait event with
+ * user_data pointing back to its AsyncRequest.
+ */
+ if ((w->events & WL_SOCKET_READABLE) != 0)
+ {
+ AsyncRequest *areq = (AsyncRequest *) w->user_data;
+
+ if (areq->callback_pending)
+ {
+ /*
+ * Mark it as no longer needing a callback. We must do this
+ * before dispatching the callback in case the callback resets
+ * the flag.
+ */
+ areq->callback_pending = false;
+
+ /* Do the actual work. */
+ ExecAsyncNotify(areq);
+ }
+ }
+
+ /* Handle standard interrupts */
+ if ((w->events & WL_LATCH_SET) != 0)
+ {
+ ResetLatch(MyLatch);
+ CHECK_FOR_INTERRUPTS();
+ }
+ }
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 54931cd6e2a..e7d741a7699 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -163,6 +163,7 @@ bool enable_parallel_hash = true;
bool enable_partition_pruning = true;
bool enable_presorted_aggregate = true;
bool enable_async_append = true;
+bool enable_async_merge_append = true;
typedef struct
{
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 05fac693989..fc0e5c62ea9 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1466,6 +1466,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
+ bool consider_async = false;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1480,6 +1481,10 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
plan->righttree = NULL;
node->apprelids = rel->relids;
+ consider_async = (enable_async_merge_append &&
+ !best_path->path.parallel_safe &&
+ list_length(best_path->subpaths) > 1);
+
/*
* Compute sort column info, and adjust MergeAppend's tlist as needed.
* Because we pass adjust_tlist_in_place = true, we may ignore the
@@ -1580,6 +1585,10 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = sort_plan;
}
+ /* If needed, check to see if subplan can be executed asynchronously */
+ if (consider_async)
+ mark_async_capable_plan(subplan, subpath);
+
subplans = lappend(subplans, subplan);
}
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index ac0c7c36c56..b60b23fa4b9 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -821,6 +821,14 @@
boot_val => 'true',
},
+{ name => 'enable_async_merge_append', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+ short_desc => 'Enables the planner\'s use of async merge append plans.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'enable_async_merge_append',
+ boot_val => 'true',
+},
+
+
{ name => 'enable_bitmapscan', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
short_desc => 'Enables the planner\'s use of bitmap-scan plans.',
flags => 'GUC_EXPLAIN',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index dc9e2255f8a..d949d2aad04 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -405,6 +405,7 @@
# - Planner Method Configuration -
#enable_async_append = on
+#enable_async_merge_append = on
#enable_bitmapscan = on
#enable_gathermerge = on
#enable_hashagg = on
diff --git a/src/include/executor/nodeMergeAppend.h b/src/include/executor/nodeMergeAppend.h
index 4eb05dc30d6..e3fdb26ece6 100644
--- a/src/include/executor/nodeMergeAppend.h
+++ b/src/include/executor/nodeMergeAppend.h
@@ -19,5 +19,6 @@
extern MergeAppendState *ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags);
extern void ExecEndMergeAppend(MergeAppendState *node);
extern void ExecReScanMergeAppend(MergeAppendState *node);
+extern void ExecAsyncMergeAppendResponse(AsyncRequest *areq);
#endif /* NODEMERGEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..5887cbf4f16 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1545,10 +1545,69 @@ typedef struct MergeAppendState
TupleTableSlot **ms_slots; /* array of length ms_nplans */
struct binaryheap *ms_heap; /* binary heap of slot indices */
bool ms_initialized; /* are subplans started? */
+ Bitmapset *ms_asyncplans; /* asynchronous plans indexes */
+ int ms_nasyncplans; /* # of asynchronous plans */
+ AsyncRequest **ms_asyncrequests; /* array of AsyncRequests */
+ TupleTableSlot **ms_asyncresults; /* unreturned results of async plans */
+ Bitmapset *ms_has_asyncresults; /* plans which have async results */
+ Bitmapset *ms_asyncremain; /* remaining asynchronous plans */
+ Bitmapset *ms_needrequest; /* asynchronous plans needing a new request */
+ struct WaitEventSet *ms_eventset; /* WaitEventSet used to configure file
+ * descriptor wait events */
struct PartitionPruneState *ms_prune_state;
+ bool ms_valid_subplans_identified; /* is ms_valid_subplans valid? */
Bitmapset *ms_valid_subplans;
+ Bitmapset *ms_valid_asyncplans; /* valid asynchronous plans indexes */
} MergeAppendState;
+/* Getters for AppendState and MergeAppendState */
+static inline struct WaitEventSet *
+GetAppendEventSet(PlanState *ps)
+{
+ Assert(IsA(ps, AppendState) || IsA(ps, MergeAppendState));
+
+ if (IsA(ps, AppendState))
+ return ((AppendState *) ps)->as_eventset;
+ else
+ return ((MergeAppendState *) ps)->ms_eventset;
+}
+
+static inline Bitmapset *
+GetNeedRequest(PlanState *ps)
+{
+ Assert(IsA(ps, AppendState) || IsA(ps, MergeAppendState));
+
+ if (IsA(ps, AppendState))
+ return ((AppendState *) ps)->as_needrequest;
+ else
+ return ((MergeAppendState *) ps)->ms_needrequest;
+}
+
+/* Common part of classify_matching_subplans() for Append and MergeAppend */
+static inline bool
+classify_matching_subplans_common(Bitmapset **valid_subplans,
+ Bitmapset *asyncplans,
+ Bitmapset **valid_asyncplans)
+{
+ Assert(*valid_asyncplans == NULL);
+
+ /* Checked by classify_matching_subplans() */
+ Assert(!bms_is_empty(*valid_subplans));
+
+ /* Nothing to do if there are no valid async subplans. */
+ if (!bms_overlap(*valid_subplans, asyncplans))
+ return false;
+
+ /* Get valid async subplans. */
+ *valid_asyncplans = bms_intersect(asyncplans,
+ *valid_subplans);
+
+ /* Adjust the valid subplans to contain sync subplans only. */
+ *valid_subplans = bms_del_members(*valid_subplans,
+ *valid_asyncplans);
+ return true;
+}
+
/* ----------------
* RecursiveUnionState information
*
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index b523bcda8f3..fee491b77ad 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -70,6 +70,7 @@ extern PGDLLIMPORT bool enable_parallel_hash;
extern PGDLLIMPORT bool enable_partition_pruning;
extern PGDLLIMPORT bool enable_presorted_aggregate;
extern PGDLLIMPORT bool enable_async_append;
+extern PGDLLIMPORT bool enable_async_merge_append;
extern PGDLLIMPORT int constraint_exclusion;
extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 0411db832f1..194b1f95289 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -149,6 +149,7 @@ select name, setting from pg_settings where name like 'enable%';
name | setting
--------------------------------+---------
enable_async_append | on
+ enable_async_merge_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
enable_eager_aggregate | on
@@ -173,7 +174,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(25 rows)
+(26 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
--
2.51.2
From f26775222d238441a5bd2fa60757d09cf50fd2cd Mon Sep 17 00:00:00 2001
From: Matheus Alcantara <[email protected]>
Date: Tue, 16 Dec 2025 16:32:14 -0300
Subject: [PATCH v12 3/3] Common infrastructure to MergeAppend and Append nodes
This commit introduces the execAppend.c file to centralize executor
operations that can be shared between MergeAppend and Append nodes.
The logic for initializing and ending the nodes, as well as initializing
asynchronous execution, is very similar for both MergeAppend and Append.
To reduce redundancy, the duplicated code has been moved to execAppend.c
so that these nodes can reuse the common implementation.
The code responsible for actually fetching the tuples remains specific
to each node and was not refactored into execAppend.c to preserve the
unique execution requirements of each node type.
---
contrib/pg_overexplain/pg_overexplain.c | 4 +-
contrib/postgres_fdw/postgres_fdw.c | 8 +-
src/backend/commands/explain.c | 26 +-
src/backend/executor/Makefile | 1 +
src/backend/executor/execAmi.c | 2 +-
src/backend/executor/execAppend.c | 409 +++++++++++++++++++
src/backend/executor/execCurrent.c | 4 +-
src/backend/executor/execProcnode.c | 8 +-
src/backend/executor/meson.build | 1 +
src/backend/executor/nodeAppend.c | 497 +++++-------------------
src/backend/executor/nodeMergeAppend.c | 416 +++-----------------
src/backend/nodes/nodeFuncs.c | 8 +-
src/backend/optimizer/plan/createplan.c | 34 +-
src/backend/optimizer/plan/setrefs.c | 44 +--
src/backend/optimizer/plan/subselect.c | 4 +-
src/backend/utils/adt/ruleutils.c | 8 +-
src/include/executor/execAppend.h | 60 +++
src/include/nodes/execnodes.h | 105 ++---
src/include/nodes/plannodes.h | 45 +--
19 files changed, 746 insertions(+), 938 deletions(-)
create mode 100644 src/backend/executor/execAppend.c
create mode 100644 src/include/executor/execAppend.h
diff --git a/contrib/pg_overexplain/pg_overexplain.c b/contrib/pg_overexplain/pg_overexplain.c
index fcdc17012da..7f18c2ab06c 100644
--- a/contrib/pg_overexplain/pg_overexplain.c
+++ b/contrib/pg_overexplain/pg_overexplain.c
@@ -228,12 +228,12 @@ overexplain_per_node_hook(PlanState *planstate, List *ancestors,
break;
case T_Append:
overexplain_bitmapset("Append RTIs",
- ((Append *) plan)->apprelids,
+ ((Append *) plan)->ap.apprelids,
es);
break;
case T_MergeAppend:
overexplain_bitmapset("Append RTIs",
- ((MergeAppend *) plan)->apprelids,
+ ((MergeAppend *) plan)->ap.apprelids,
es);
break;
case T_Result:
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index bd551a1db72..b01ad40ad17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2412,8 +2412,8 @@ find_modifytable_subplan(PlannerInfo *root,
{
Append *appendplan = (Append *) subplan;
- if (subplan_index < list_length(appendplan->appendplans))
- subplan = (Plan *) list_nth(appendplan->appendplans, subplan_index);
+ if (subplan_index < list_length(appendplan->ap.subplans))
+ subplan = (Plan *) list_nth(appendplan->ap.subplans, subplan_index);
}
else if (IsA(subplan, Result) &&
outerPlan(subplan) != NULL &&
@@ -2421,8 +2421,8 @@ find_modifytable_subplan(PlannerInfo *root,
{
Append *appendplan = (Append *) outerPlan(subplan);
- if (subplan_index < list_length(appendplan->appendplans))
- subplan = (Plan *) list_nth(appendplan->appendplans, subplan_index);
+ if (subplan_index < list_length(appendplan->ap.subplans))
+ subplan = (Plan *) list_nth(appendplan->ap.subplans, subplan_index);
}
/* Now, have we got a ForeignScan on the desired rel? */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5a6390631eb..3eaa1f7459e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1224,11 +1224,11 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
break;
case T_Append:
*rels_used = bms_add_members(*rels_used,
- ((Append *) plan)->apprelids);
+ ((Append *) plan)->ap.apprelids);
break;
case T_MergeAppend:
*rels_used = bms_add_members(*rels_used,
- ((MergeAppend *) plan)->apprelids);
+ ((MergeAppend *) plan)->ap.apprelids);
break;
case T_Result:
*rels_used = bms_add_members(*rels_used,
@@ -1272,7 +1272,7 @@ plan_is_disabled(Plan *plan)
* includes any run-time pruned children. Ignoring those could give
* us the incorrect number of disabled nodes.
*/
- foreach(lc, aplan->appendplans)
+ foreach(lc, aplan->ap.subplans)
{
Plan *subplan = lfirst(lc);
@@ -1289,7 +1289,7 @@ plan_is_disabled(Plan *plan)
* includes any run-time pruned children. Ignoring those could give
* us the incorrect number of disabled nodes.
*/
- foreach(lc, maplan->mergeplans)
+ foreach(lc, maplan->ap.subplans)
{
Plan *subplan = lfirst(lc);
@@ -2336,13 +2336,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
switch (nodeTag(plan))
{
case T_Append:
- ExplainMissingMembers(((AppendState *) planstate)->as_nplans,
- list_length(((Append *) plan)->appendplans),
+ ExplainMissingMembers(((AppendState *) planstate)->as.nplans,
+ list_length(((Append *) plan)->ap.subplans),
es);
break;
case T_MergeAppend:
- ExplainMissingMembers(((MergeAppendState *) planstate)->ms_nplans,
- list_length(((MergeAppend *) plan)->mergeplans),
+ ExplainMissingMembers(((MergeAppendState *) planstate)->ms.nplans,
+ list_length(((MergeAppend *) plan)->ap.subplans),
es);
break;
default:
@@ -2386,13 +2386,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
switch (nodeTag(plan))
{
case T_Append:
- ExplainMemberNodes(((AppendState *) planstate)->appendplans,
- ((AppendState *) planstate)->as_nplans,
+ ExplainMemberNodes(((AppendState *) planstate)->as.plans,
+ ((AppendState *) planstate)->as.nplans,
ancestors, es);
break;
case T_MergeAppend:
- ExplainMemberNodes(((MergeAppendState *) planstate)->mergeplans,
- ((MergeAppendState *) planstate)->ms_nplans,
+ ExplainMemberNodes(((MergeAppendState *) planstate)->ms.plans,
+ ((MergeAppendState *) planstate)->ms.nplans,
ancestors, es);
break;
case T_BitmapAnd:
@@ -2606,7 +2606,7 @@ static void
show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
ExplainState *es)
{
- MergeAppend *plan = (MergeAppend *) mstate->ps.plan;
+ MergeAppend *plan = (MergeAppend *) mstate->ms.ps.plan;
show_sort_group_keys((PlanState *) mstate, "Sort Key",
plan->numCols, 0, plan->sortColIdx,
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 11118d0ce02..66b62fca921 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,6 +15,7 @@ include $(top_builddir)/src/Makefile.global
OBJS = \
execAmi.o \
execAsync.o \
+ execAppend.o \
execCurrent.o \
execExpr.o \
execExprInterp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 1d0e8ad57b4..5c897048ba3 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -537,7 +537,7 @@ ExecSupportsBackwardScan(Plan *node)
if (((Append *) node)->nasyncplans > 0)
return false;
- foreach(l, ((Append *) node)->appendplans)
+ foreach(l, ((Append *) node)->ap.subplans)
{
if (!ExecSupportsBackwardScan((Plan *) lfirst(l)))
return false;
diff --git a/src/backend/executor/execAppend.c b/src/backend/executor/execAppend.c
new file mode 100644
index 00000000000..b7222ab7940
--- /dev/null
+++ b/src/backend/executor/execAppend.c
@@ -0,0 +1,409 @@
+/*-------------------------------------------------------------------------
+ *
+ * execAppend.c
+ * This code provides support functions for executing MergeAppend and Append
+ * nodes.
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/execAppend.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "executor/executor.h"
+#include "executor/execAppend.h"
+#include "executor/execAsync.h"
+#include "executor/execPartition.h"
+#include "storage/latch.h"
+#include "storage/waiteventset.h"
+#include "miscadmin.h"
+
+#define EVENT_BUFFER_SIZE 16
+
+/* Begin all of the subscans of an Appender node. */
+void
+ExecInitAppender(AppenderState * state,
+ Appender * node,
+ EState *estate,
+ int eflags,
+ int first_partial_plan,
+ int *first_valid_partial_plan)
+{
+ PlanState **appendplanstates;
+ const TupleTableSlotOps *appendops;
+ Bitmapset *validsubplans;
+ Bitmapset *asyncplans;
+ int nplans;
+ int nasyncplans;
+ int firstvalid;
+ int i,
+ j;
+
+ /* If run-time partition pruning is enabled, then set that up now */
+ if (node->part_prune_index >= 0)
+ {
+ PartitionPruneState *prunestate;
+
+ /*
+ * Set up pruning data structure. This also initializes the set of
+ * subplans to initialize (validsubplans) by taking into account the
+ * result of performing initial pruning if any.
+ */
+ prunestate = ExecInitPartitionExecPruning(&state->ps,
+ list_length(node->subplans),
+ node->part_prune_index,
+ node->apprelids,
+ &validsubplans);
+ state->prune_state = prunestate;
+ nplans = bms_num_members(validsubplans);
+
+ /*
+ * When no run-time pruning is required and there's at least one
+ * subplan, we can fill valid_subplans immediately, preventing
+ * later calls to ExecFindMatchingSubPlans.
+ */
+ if (!prunestate->do_exec_prune && nplans > 0)
+ {
+ state->valid_subplans = bms_add_range(NULL, 0, nplans - 1);
+ state->valid_subplans_identified = true;
+ }
+ }
+ else
+ {
+ nplans = list_length(node->subplans);
+
+ /*
+ * When run-time partition pruning is not enabled we can just mark all
+ * subplans as valid; they must also all be initialized.
+ */
+ Assert(nplans > 0);
+ state->valid_subplans = validsubplans =
+ bms_add_range(NULL, 0, nplans - 1);
+ state->valid_subplans_identified = true;
+ state->prune_state = NULL;
+ }
+
+ appendplanstates = palloc0_array(PlanState *, nplans);
+
+ /*
+ * call ExecInitNode on each of the valid plans to be executed and save
+ * the results into the appendplanstates array.
+ *
+ * While at it, find out the first valid partial plan.
+ */
+ j = 0;
+ asyncplans = NULL;
+ nasyncplans = 0;
+ firstvalid = nplans;
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *initNode = (Plan *) list_nth(node->subplans, i);
+
+ /*
+ * Record async subplans. When executing EvalPlanQual, we treat them
+ * as sync ones; don't do this when initializing an EvalPlanQual plan
+ * tree.
+ */
+ if (initNode->async_capable && estate->es_epq_active == NULL)
+ {
+ asyncplans = bms_add_member(asyncplans, j);
+ nasyncplans++;
+ }
+
+ /*
+ * Record the lowest appendplans index which is a valid partial plan.
+ */
+ if (first_valid_partial_plan && i >= first_partial_plan && j < firstvalid)
+ firstvalid = j;
+
+ appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ }
+
+ if (first_valid_partial_plan)
+ *first_valid_partial_plan = firstvalid;
+
+ state->plans = appendplanstates;
+ state->nplans = nplans;
+
+ /*
+ * Initialize Append's result tuple type and slot. If the child plans all
+ * produce the same fixed slot type, we can use that slot type; otherwise
+ * make a virtual slot. (Note that the result slot itself is used only to
+ * return a null tuple at end of execution; real tuples are returned to
+ * the caller in the children's own result slots. What we are doing here
+ * is allowing the parent plan node to optimize if the Append will return
+ * only one kind of slot.)
+ */
+ appendops = ExecGetCommonSlotOps(appendplanstates, j);
+ if (appendops != NULL)
+ {
+ ExecInitResultTupleSlotTL(&state->ps, appendops);
+ }
+ else
+ {
+ ExecInitResultTupleSlotTL(&state->ps, &TTSOpsVirtual);
+ /* show that the output slot type is not fixed */
+ state->ps.resultopsset = true;
+ state->ps.resultopsfixed = false;
+ }
+
+ /* Initialize async state */
+ state->asyncplans = asyncplans;
+ state->nasyncplans = nasyncplans;
+ state->asyncrequests = NULL;
+ state->asyncresults = NULL;
+ state->needrequest = NULL;
+ state->eventset = NULL;
+ state->valid_asyncplans = NULL;
+
+ if (nasyncplans > 0)
+ {
+ state->asyncrequests = palloc0_array(AsyncRequest *, nplans);
+
+ i = -1;
+ while ((i = bms_next_member(asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq;
+
+ areq = palloc_object(AsyncRequest);
+ areq->requestor = (PlanState *) state;
+ areq->requestee = appendplanstates[i];
+ areq->request_index = i;
+ areq->callback_pending = false;
+ areq->request_complete = false;
+ areq->result = NULL;
+
+ state->asyncrequests[i] = areq;
+ }
+
+ /*
+ * AppendState and MergeAppendState have slightly different allocation
+ * sizes for asyncresults in the original code, but we unify to the
+ * larger requirement or specific nplans if required.
+ */
+ state->asyncresults = palloc0_array(TupleTableSlot *, nplans);
+ }
+
+ /*
+ * Miscellaneous initialization
+ */
+ state->ps.ps_ProjInfo = NULL;
+}
+
+void
+ExecReScanAppender(AppenderState * node)
+{
+ int i;
+ int nasyncplans = node->nasyncplans;
+
+ /*
+ * If any PARAM_EXEC Params used in pruning expressions have changed, then
+ * we'd better unset the valid subplans so that they are reselected for
+ * the new parameter values.
+ */
+ if (node->prune_state &&
+ bms_overlap(node->ps.chgParam,
+ node->prune_state->execparamids))
+ {
+ node->valid_subplans_identified = false;
+ bms_free(node->valid_subplans);
+ node->valid_subplans = NULL;
+ bms_free(node->valid_asyncplans);
+ node->valid_asyncplans = NULL;
+ }
+
+ for (i = 0; i < node->nplans; i++)
+ {
+ PlanState *subnode = node->plans[i];
+
+ /*
+ * ExecReScan doesn't know about my subplans, so I have to do
+ * changed-parameter signaling myself.
+ */
+ if (node->ps.chgParam != NULL)
+ UpdateChangedParamSet(subnode, node->ps.chgParam);
+
+ /*
+ * If chgParam of subnode is not null then plan will be re-scanned by
+ * first ExecProcNode.
+ */
+ if (subnode->chgParam == NULL)
+ ExecReScan(subnode);
+ }
+
+ /* Reset async state */
+ if (nasyncplans > 0)
+ {
+ i = -1;
+ while ((i = bms_next_member(node->asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq = node->asyncrequests[i];
+
+ areq->callback_pending = false;
+ areq->request_complete = false;
+ areq->result = NULL;
+ }
+
+ bms_free(node->needrequest);
+ node->needrequest = NULL;
+ }
+}
+
+/* Wait or poll for file descriptor events and fire callbacks. */
+void
+ExecAppenderAsyncEventWait(AppenderState * node, int timeout, uint32 wait_event_info)
+{
+ int nevents = node->nasyncplans + 2; /* one for PM death and
+ * one for latch */
+ int noccurred;
+ int i;
+ WaitEvent occurred_event[EVENT_BUFFER_SIZE];
+
+ Assert(node->eventset == NULL);
+
+ node->eventset = CreateWaitEventSet(CurrentResourceOwner, nevents);
+ AddWaitEventToSet(node->eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
+ NULL, NULL);
+
+ /* Give each waiting subplan a chance to add an event. */
+ i = -1;
+ while ((i = bms_next_member(node->asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq = node->asyncrequests[i];
+
+ if (areq->callback_pending)
+ ExecAsyncConfigureWait(areq);
+ }
+
+ /*
+ * No need for further processing if none of the subplans configured any
+ * events.
+ */
+ if (GetNumRegisteredWaitEvents(node->eventset) == 1)
+ {
+ FreeWaitEventSet(node->eventset);
+ node->eventset = NULL;
+ return;
+ }
+
+ /*
+ * Add the process latch to the set, so that we wake up to process the
+ * standard interrupts with CHECK_FOR_INTERRUPTS().
+ *
+ * NOTE: For historical reasons, it's important that this is added to the
+ * WaitEventSet after the ExecAsyncConfigureWait() calls. Namely,
+ * postgres_fdw calls "GetNumRegisteredWaitEvents(set) == 1" to check if
+ * any other events are in the set. That's a poor design, it's
+ * questionable for postgres_fdw to be doing that in the first place, but
+ * we cannot change it now. The pattern has possibly been copied to other
+ * extensions too.
+ */
+ AddWaitEventToSet(node->eventset, WL_LATCH_SET, PGINVALID_SOCKET,
+ MyLatch, NULL);
+
+ /* Return at most EVENT_BUFFER_SIZE events in one call. */
+ if (nevents > EVENT_BUFFER_SIZE)
+ nevents = EVENT_BUFFER_SIZE;
+
+ /* Wait until at least one event occurs. */
+ noccurred = WaitEventSetWait(node->eventset, timeout, occurred_event,
+ nevents, wait_event_info);
+
+
+ FreeWaitEventSet(node->eventset);
+ node->eventset = NULL;
+ if (noccurred == 0)
+ return;
+
+
+ /* Deliver notifications. */
+ for (i = 0; i < noccurred; i++)
+ {
+ WaitEvent *w = &occurred_event[i];
+
+ /*
+ * Each waiting subplan should have registered its wait event with
+ * user_data pointing back to its AsyncRequest.
+ */
+ if ((w->events & WL_SOCKET_READABLE) != 0)
+ {
+ AsyncRequest *areq = (AsyncRequest *) w->user_data;
+
+ if (areq->callback_pending)
+ {
+ /*
+ * Mark it as no longer needing a callback. We must do this
+ * before dispatching the callback in case the callback resets
+ * the flag.
+ */
+ areq->callback_pending = false;
+
+ /* Do the actual work. */
+ ExecAsyncNotify(areq);
+ }
+ }
+
+ /* Handle standard interrupts */
+ if ((w->events & WL_LATCH_SET) != 0)
+ {
+ ResetLatch(MyLatch);
+ CHECK_FOR_INTERRUPTS();
+ }
+ }
+}
+
+/* Begin executing async-capable subplans. */
+void
+ExecAppenderAsyncBegin(AppenderState * node)
+{
+ int i;
+
+ /* Backward scan is not supported by async-aware Appends. */
+ Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+ /* We should never be called when there are no subplans */
+ Assert(node->nplans > 0);
+
+ /* We should never be called when there are no async subplans. */
+ Assert(node->nasyncplans > 0);
+
+ /* Make a request for each of the valid async subplans. */
+ i = -1;
+ while ((i = bms_next_member(node->valid_asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq = node->asyncrequests[i];
+
+ Assert(areq->request_index == i);
+ Assert(!areq->callback_pending);
+
+ /* Do the actual work. */
+ ExecAsyncRequest(areq);
+ }
+}
+
+/* Shuts down the subplans of an Appender node. */
+void
+ExecEndAppender(AppenderState * node)
+{
+ PlanState **subplans;
+ int nplans;
+ int i;
+
+ /*
+ * get information from the node
+ */
+ subplans = node->plans;
+ nplans = node->nplans;
+
+ /*
+ * shut down each of the subscans
+ */
+ for (i = 0; i < nplans; i++)
+ ExecEndNode(subplans[i]);
+}
+
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index 3bfdc0230ff..e8cf2ead8a8 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -375,9 +375,9 @@ search_plan_tree(PlanState *node, Oid table_oid,
AppendState *astate = (AppendState *) node;
int i;
- for (i = 0; i < astate->as_nplans; i++)
+ for (i = 0; i < astate->as.nplans; i++)
{
- ScanState *elem = search_plan_tree(astate->appendplans[i],
+ ScanState *elem = search_plan_tree(astate->as.plans[i],
table_oid,
pending_rescan);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..3eb1de1cd30 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -910,8 +910,8 @@ ExecSetTupleBound(int64 tuples_needed, PlanState *child_node)
AppendState *aState = (AppendState *) child_node;
int i;
- for (i = 0; i < aState->as_nplans; i++)
- ExecSetTupleBound(tuples_needed, aState->appendplans[i]);
+ for (i = 0; i < aState->as.nplans; i++)
+ ExecSetTupleBound(tuples_needed, aState->as.plans[i]);
}
else if (IsA(child_node, MergeAppendState))
{
@@ -923,8 +923,8 @@ ExecSetTupleBound(int64 tuples_needed, PlanState *child_node)
MergeAppendState *maState = (MergeAppendState *) child_node;
int i;
- for (i = 0; i < maState->ms_nplans; i++)
- ExecSetTupleBound(tuples_needed, maState->mergeplans[i]);
+ for (i = 0; i < maState->ms.nplans; i++)
+ ExecSetTupleBound(tuples_needed, maState->ms.plans[i]);
}
else if (IsA(child_node, ResultState))
{
diff --git a/src/backend/executor/meson.build b/src/backend/executor/meson.build
index 2cea41f8771..b5cb710a59f 100644
--- a/src/backend/executor/meson.build
+++ b/src/backend/executor/meson.build
@@ -3,6 +3,7 @@
backend_sources += files(
'execAmi.c',
'execAsync.c',
+ 'execAppend.c',
'execCurrent.c',
'execExpr.c',
'execExprInterp.c',
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index dfbc7b510c4..5c39ee275d2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -57,13 +57,13 @@
#include "postgres.h"
+#include "executor/execAppend.h"
#include "executor/execAsync.h"
#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAppend.h"
#include "miscadmin.h"
#include "pgstat.h"
-#include "storage/latch.h"
/* Shared state for parallel-aware Append. */
struct ParallelAppendState
@@ -109,15 +109,6 @@ AppendState *
ExecInitAppend(Append *node, EState *estate, int eflags)
{
AppendState *appendstate = makeNode(AppendState);
- PlanState **appendplanstates;
- const TupleTableSlotOps *appendops;
- Bitmapset *validsubplans;
- Bitmapset *asyncplans;
- int nplans;
- int nasyncplans;
- int firstvalid;
- int i,
- j;
/* check for unsupported flags */
Assert(!(eflags & EXEC_FLAG_MARK));
@@ -125,167 +116,27 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
/*
* create new AppendState for our append node
*/
- appendstate->ps.plan = (Plan *) node;
- appendstate->ps.state = estate;
- appendstate->ps.ExecProcNode = ExecAppend;
+ appendstate->as.ps.plan = (Plan *) node;
+ appendstate->as.ps.state = estate;
+ appendstate->as.ps.ExecProcNode = ExecAppend;
/* Let choose_next_subplan_* function handle setting the first subplan */
appendstate->as_whichplan = INVALID_SUBPLAN_INDEX;
appendstate->as_syncdone = false;
appendstate->as_begun = false;
- /* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_index >= 0)
- {
- PartitionPruneState *prunestate;
-
- /*
- * Set up pruning data structure. This also initializes the set of
- * subplans to initialize (validsubplans) by taking into account the
- * result of performing initial pruning if any.
- */
- prunestate = ExecInitPartitionExecPruning(&appendstate->ps,
- list_length(node->appendplans),
- node->part_prune_index,
- node->apprelids,
- &validsubplans);
- appendstate->as_prune_state = prunestate;
- nplans = bms_num_members(validsubplans);
-
- /*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill as_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
- */
- if (!prunestate->do_exec_prune && nplans > 0)
- {
- appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
- appendstate->as_valid_subplans_identified = true;
- }
- }
- else
- {
- nplans = list_length(node->appendplans);
-
- /*
- * When run-time partition pruning is not enabled we can just mark all
- * subplans as valid; they must also all be initialized.
- */
- Assert(nplans > 0);
- appendstate->as_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
- appendstate->as_valid_subplans_identified = true;
- appendstate->as_prune_state = NULL;
- }
-
- appendplanstates = (PlanState **) palloc(nplans *
- sizeof(PlanState *));
-
- /*
- * call ExecInitNode on each of the valid plans to be executed and save
- * the results into the appendplanstates array.
- *
- * While at it, find out the first valid partial plan.
- */
- j = 0;
- asyncplans = NULL;
- nasyncplans = 0;
- firstvalid = nplans;
- i = -1;
- while ((i = bms_next_member(validsubplans, i)) >= 0)
- {
- Plan *initNode = (Plan *) list_nth(node->appendplans, i);
-
- /*
- * Record async subplans. When executing EvalPlanQual, we treat them
- * as sync ones; don't do this when initializing an EvalPlanQual plan
- * tree.
- */
- if (initNode->async_capable && estate->es_epq_active == NULL)
- {
- asyncplans = bms_add_member(asyncplans, j);
- nasyncplans++;
- }
-
- /*
- * Record the lowest appendplans index which is a valid partial plan.
- */
- if (i >= node->first_partial_plan && j < firstvalid)
- firstvalid = j;
-
- appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
- }
-
- appendstate->as_first_partial_plan = firstvalid;
- appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
+ /* Initialize common fields */
+ ExecInitAppender(&appendstate->as,
+ &node->ap,
+ estate,
+ eflags,
+ node->first_partial_plan,
+ &appendstate->as_first_partial_plan);
- /*
- * Initialize Append's result tuple type and slot. If the child plans all
- * produce the same fixed slot type, we can use that slot type; otherwise
- * make a virtual slot. (Note that the result slot itself is used only to
- * return a null tuple at end of execution; real tuples are returned to
- * the caller in the children's own result slots. What we are doing here
- * is allowing the parent plan node to optimize if the Append will return
- * only one kind of slot.)
- */
- appendops = ExecGetCommonSlotOps(appendplanstates, j);
- if (appendops != NULL)
- {
- ExecInitResultTupleSlotTL(&appendstate->ps, appendops);
- }
- else
- {
- ExecInitResultTupleSlotTL(&appendstate->ps, &TTSOpsVirtual);
- /* show that the output slot type is not fixed */
- appendstate->ps.resultopsset = true;
- appendstate->ps.resultopsfixed = false;
- }
+ if (appendstate->as.nasyncplans > 0 && appendstate->as.valid_subplans_identified)
+ classify_matching_subplans(appendstate);
- /* Initialize async state */
- appendstate->as_asyncplans = asyncplans;
- appendstate->as_nasyncplans = nasyncplans;
- appendstate->as_asyncrequests = NULL;
- appendstate->as_asyncresults = NULL;
- appendstate->as_nasyncresults = 0;
appendstate->as_nasyncremain = 0;
- appendstate->as_needrequest = NULL;
- appendstate->as_eventset = NULL;
- appendstate->as_valid_asyncplans = NULL;
-
- if (nasyncplans > 0)
- {
- appendstate->as_asyncrequests = (AsyncRequest **)
- palloc0(nplans * sizeof(AsyncRequest *));
-
- i = -1;
- while ((i = bms_next_member(asyncplans, i)) >= 0)
- {
- AsyncRequest *areq;
-
- areq = palloc_object(AsyncRequest);
- areq->requestor = (PlanState *) appendstate;
- areq->requestee = appendplanstates[i];
- areq->request_index = i;
- areq->callback_pending = false;
- areq->request_complete = false;
- areq->result = NULL;
-
- appendstate->as_asyncrequests[i] = areq;
- }
-
- appendstate->as_asyncresults = (TupleTableSlot **)
- palloc0(nasyncplans * sizeof(TupleTableSlot *));
-
- if (appendstate->as_valid_subplans_identified)
- classify_matching_subplans(appendstate);
- }
-
- /*
- * Miscellaneous initialization
- */
-
- appendstate->ps.ps_ProjInfo = NULL;
/* For parallel query, this will be overridden later. */
appendstate->choose_next_subplan = choose_next_subplan_locally;
@@ -315,11 +166,11 @@ ExecAppend(PlanState *pstate)
Assert(!node->as_syncdone);
/* Nothing to do if there are no subplans */
- if (node->as_nplans == 0)
- return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->as.nplans == 0)
+ return ExecClearTuple(node->as.ps.ps_ResultTupleSlot);
/* If there are any async subplans, begin executing them. */
- if (node->as_nasyncplans > 0)
+ if (node->as.nasyncplans > 0)
ExecAppendAsyncBegin(node);
/*
@@ -327,11 +178,11 @@ ExecAppend(PlanState *pstate)
* proceeding.
*/
if (!node->choose_next_subplan(node) && node->as_nasyncremain == 0)
- return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ return ExecClearTuple(node->as.ps.ps_ResultTupleSlot);
Assert(node->as_syncdone ||
(node->as_whichplan >= 0 &&
- node->as_whichplan < node->as_nplans));
+ node->as_whichplan < node->as.nplans));
/* And we're initialized. */
node->as_begun = true;
@@ -346,19 +197,19 @@ ExecAppend(PlanState *pstate)
/*
* try to get a tuple from an async subplan if any
*/
- if (node->as_syncdone || !bms_is_empty(node->as_needrequest))
+ if (node->as_syncdone || !bms_is_empty(node->as.needrequest))
{
if (ExecAppendAsyncGetNext(node, &result))
return result;
Assert(!node->as_syncdone);
- Assert(bms_is_empty(node->as_needrequest));
+ Assert(bms_is_empty(node->as.needrequest));
}
/*
* figure out which sync subplan we are currently processing
*/
- Assert(node->as_whichplan >= 0 && node->as_whichplan < node->as_nplans);
- subnode = node->appendplans[node->as_whichplan];
+ Assert(node->as_whichplan >= 0 && node->as_whichplan < node->as.nplans);
+ subnode = node->as.plans[node->as_whichplan];
/*
* get a tuple from the subplan
@@ -385,7 +236,7 @@ ExecAppend(PlanState *pstate)
/* choose new sync subplan; if no sync/async subplans, we're done */
if (!node->choose_next_subplan(node) && node->as_nasyncremain == 0)
- return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ return ExecClearTuple(node->as.ps.ps_ResultTupleSlot);
}
}
@@ -400,81 +251,22 @@ ExecAppend(PlanState *pstate)
void
ExecEndAppend(AppendState *node)
{
- PlanState **appendplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- appendplans = node->appendplans;
- nplans = node->as_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(appendplans[i]);
+ ExecEndAppender(&node->as);
}
void
ExecReScanAppend(AppendState *node)
{
- int nasyncplans = node->as_nasyncplans;
- int i;
-
- /*
- * If any PARAM_EXEC Params used in pruning expressions have changed, then
- * we'd better unset the valid subplans so that they are reselected for
- * the new parameter values.
- */
- if (node->as_prune_state &&
- bms_overlap(node->ps.chgParam,
- node->as_prune_state->execparamids))
- {
- node->as_valid_subplans_identified = false;
- bms_free(node->as_valid_subplans);
- node->as_valid_subplans = NULL;
- bms_free(node->as_valid_asyncplans);
- node->as_valid_asyncplans = NULL;
- }
-
- for (i = 0; i < node->as_nplans; i++)
- {
- PlanState *subnode = node->appendplans[i];
- /*
- * ExecReScan doesn't know about my subplans, so I have to do
- * changed-parameter signaling myself.
- */
- if (node->ps.chgParam != NULL)
- UpdateChangedParamSet(subnode, node->ps.chgParam);
+ int nasyncplans = node->as.nasyncplans;
- /*
- * If chgParam of subnode is not null then plan will be re-scanned by
- * first ExecProcNode or by first ExecAsyncRequest.
- */
- if (subnode->chgParam == NULL)
- ExecReScan(subnode);
- }
+ ExecReScanAppender(&node->as);
- /* Reset async state */
+ /* Reset specific append async state */
if (nasyncplans > 0)
{
- i = -1;
- while ((i = bms_next_member(node->as_asyncplans, i)) >= 0)
- {
- AsyncRequest *areq = node->as_asyncrequests[i];
-
- areq->callback_pending = false;
- areq->request_complete = false;
- areq->result = NULL;
- }
-
node->as_nasyncresults = 0;
node->as_nasyncremain = 0;
- bms_free(node->as_needrequest);
- node->as_needrequest = NULL;
}
/* Let choose_next_subplan_* function handle setting the first subplan */
@@ -501,7 +293,7 @@ ExecAppendEstimate(AppendState *node,
{
node->pstate_len =
add_size(offsetof(ParallelAppendState, pa_finished),
- sizeof(bool) * node->as_nplans);
+ sizeof(bool) * node->as.nplans);
shm_toc_estimate_chunk(&pcxt->estimator, node->pstate_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
@@ -523,7 +315,7 @@ ExecAppendInitializeDSM(AppendState *node,
pstate = shm_toc_allocate(pcxt->toc, node->pstate_len);
memset(pstate, 0, node->pstate_len);
LWLockInitialize(&pstate->pa_lock, LWTRANCHE_PARALLEL_APPEND);
- shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, pstate);
+ shm_toc_insert(pcxt->toc, node->as.ps.plan->plan_node_id, pstate);
node->as_pstate = pstate;
node->choose_next_subplan = choose_next_subplan_for_leader;
@@ -541,7 +333,7 @@ ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
ParallelAppendState *pstate = node->as_pstate;
pstate->pa_next_plan = 0;
- memset(pstate->pa_finished, 0, sizeof(bool) * node->as_nplans);
+ memset(pstate->pa_finished, 0, sizeof(bool) * node->as.nplans);
}
/* ----------------------------------------------------------------
@@ -554,7 +346,7 @@ ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
void
ExecAppendInitializeWorker(AppendState *node, ParallelWorkerContext *pwcxt)
{
- node->as_pstate = shm_toc_lookup(pwcxt->toc, node->ps.plan->plan_node_id, false);
+ node->as_pstate = shm_toc_lookup(pwcxt->toc, node->as.ps.plan->plan_node_id, false);
node->choose_next_subplan = choose_next_subplan_for_worker;
}
@@ -572,7 +364,7 @@ choose_next_subplan_locally(AppendState *node)
int nextplan;
/* We should never be called when there are no subplans */
- Assert(node->as_nplans > 0);
+ Assert(node->as.nplans > 0);
/* Nothing to do if syncdone */
if (node->as_syncdone)
@@ -587,33 +379,33 @@ choose_next_subplan_locally(AppendState *node)
*/
if (whichplan == INVALID_SUBPLAN_INDEX)
{
- if (node->as_nasyncplans > 0)
+ if (node->as.nasyncplans > 0)
{
/* We'd have filled as_valid_subplans already */
- Assert(node->as_valid_subplans_identified);
+ Assert(node->as.valid_subplans_identified);
}
- else if (!node->as_valid_subplans_identified)
+ else if (!node->as.valid_subplans_identified)
{
- node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
- node->as_valid_subplans_identified = true;
+ node->as.valid_subplans =
+ ExecFindMatchingSubPlans(node->as.prune_state, false, NULL);
+ node->as.valid_subplans_identified = true;
}
whichplan = -1;
}
/* Ensure whichplan is within the expected range */
- Assert(whichplan >= -1 && whichplan <= node->as_nplans);
+ Assert(whichplan >= -1 && whichplan <= node->as.nplans);
- if (ScanDirectionIsForward(node->ps.state->es_direction))
- nextplan = bms_next_member(node->as_valid_subplans, whichplan);
+ if (ScanDirectionIsForward(node->as.ps.state->es_direction))
+ nextplan = bms_next_member(node->as.valid_subplans, whichplan);
else
- nextplan = bms_prev_member(node->as_valid_subplans, whichplan);
+ nextplan = bms_prev_member(node->as.valid_subplans, whichplan);
if (nextplan < 0)
{
/* Set as_syncdone if in async mode */
- if (node->as_nasyncplans > 0)
+ if (node->as.nasyncplans > 0)
node->as_syncdone = true;
return false;
}
@@ -637,10 +429,10 @@ choose_next_subplan_for_leader(AppendState *node)
ParallelAppendState *pstate = node->as_pstate;
/* Backward scan is not supported by parallel-aware plans */
- Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+ Assert(ScanDirectionIsForward(node->as.ps.state->es_direction));
/* We should never be called when there are no subplans */
- Assert(node->as_nplans > 0);
+ Assert(node->as.nplans > 0);
LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
@@ -652,18 +444,18 @@ choose_next_subplan_for_leader(AppendState *node)
else
{
/* Start with last subplan. */
- node->as_whichplan = node->as_nplans - 1;
+ node->as_whichplan = node->as.nplans - 1;
/*
* If we've yet to determine the valid subplans then do so now. If
* run-time pruning is disabled then the valid subplans will always be
* set to all subplans.
*/
- if (!node->as_valid_subplans_identified)
+ if (!node->as.valid_subplans_identified)
{
- node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
- node->as_valid_subplans_identified = true;
+ node->as.valid_subplans =
+ ExecFindMatchingSubPlans(node->as.prune_state, false, NULL);
+ node->as.valid_subplans_identified = true;
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -719,10 +511,10 @@ choose_next_subplan_for_worker(AppendState *node)
ParallelAppendState *pstate = node->as_pstate;
/* Backward scan is not supported by parallel-aware plans */
- Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+ Assert(ScanDirectionIsForward(node->as.ps.state->es_direction));
/* We should never be called when there are no subplans */
- Assert(node->as_nplans > 0);
+ Assert(node->as.nplans > 0);
LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
@@ -735,11 +527,11 @@ choose_next_subplan_for_worker(AppendState *node)
* run-time pruning is disabled then the valid subplans will always be set
* to all subplans.
*/
- else if (!node->as_valid_subplans_identified)
+ else if (!node->as.valid_subplans_identified)
{
- node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
- node->as_valid_subplans_identified = true;
+ node->as.valid_subplans =
+ ExecFindMatchingSubPlans(node->as.prune_state, false, NULL);
+ node->as.valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
}
@@ -759,7 +551,7 @@ choose_next_subplan_for_worker(AppendState *node)
{
int nextplan;
- nextplan = bms_next_member(node->as_valid_subplans,
+ nextplan = bms_next_member(node->as.valid_subplans,
pstate->pa_next_plan);
if (nextplan >= 0)
{
@@ -772,7 +564,7 @@ choose_next_subplan_for_worker(AppendState *node)
* Try looping back to the first valid partial plan, if there is
* one. If there isn't, arrange to bail out below.
*/
- nextplan = bms_next_member(node->as_valid_subplans,
+ nextplan = bms_next_member(node->as.valid_subplans,
node->as_first_partial_plan - 1);
pstate->pa_next_plan =
nextplan < 0 ? node->as_whichplan : nextplan;
@@ -797,7 +589,7 @@ choose_next_subplan_for_worker(AppendState *node)
/* Pick the plan we found, and advance pa_next_plan one more time. */
node->as_whichplan = pstate->pa_next_plan;
- pstate->pa_next_plan = bms_next_member(node->as_valid_subplans,
+ pstate->pa_next_plan = bms_next_member(node->as.valid_subplans,
pstate->pa_next_plan);
/*
@@ -806,7 +598,7 @@ choose_next_subplan_for_worker(AppendState *node)
*/
if (pstate->pa_next_plan < 0)
{
- int nextplan = bms_next_member(node->as_valid_subplans,
+ int nextplan = bms_next_member(node->as.valid_subplans,
node->as_first_partial_plan - 1);
if (nextplan >= 0)
@@ -848,16 +640,16 @@ mark_invalid_subplans_as_finished(AppendState *node)
Assert(node->as_pstate);
/* Shouldn't have been called when run-time pruning is not enabled */
- Assert(node->as_prune_state);
+ Assert(node->as.prune_state);
/* Nothing to do if all plans are valid */
- if (bms_num_members(node->as_valid_subplans) == node->as_nplans)
+ if (bms_num_members(node->as.valid_subplans) == node->as.nplans)
return;
/* Mark all non-valid plans as finished */
- for (i = 0; i < node->as_nplans; i++)
+ for (i = 0; i < node->as.nplans; i++)
{
- if (!bms_is_member(i, node->as_valid_subplans))
+ if (!bms_is_member(i, node->as.valid_subplans))
node->as_pstate->pa_finished[i] = true;
}
}
@@ -876,47 +668,25 @@ mark_invalid_subplans_as_finished(AppendState *node)
static void
ExecAppendAsyncBegin(AppendState *node)
{
- int i;
-
- /* Backward scan is not supported by async-aware Appends. */
- Assert(ScanDirectionIsForward(node->ps.state->es_direction));
-
- /* We should never be called when there are no subplans */
- Assert(node->as_nplans > 0);
-
- /* We should never be called when there are no async subplans. */
- Assert(node->as_nasyncplans > 0);
-
/* If we've yet to determine the valid subplans then do so now. */
- if (!node->as_valid_subplans_identified)
+ if (!node->as.valid_subplans_identified)
{
- node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
- node->as_valid_subplans_identified = true;
+ node->as.valid_subplans =
+ ExecFindMatchingSubPlans(node->as.prune_state, false, NULL);
+ node->as.valid_subplans_identified = true;
classify_matching_subplans(node);
}
/* Initialize state variables. */
- node->as_syncdone = bms_is_empty(node->as_valid_subplans);
- node->as_nasyncremain = bms_num_members(node->as_valid_asyncplans);
+ node->as_syncdone = bms_is_empty(node->as.valid_subplans);
+ node->as_nasyncremain = bms_num_members(node->as.valid_asyncplans);
/* Nothing to do if there are no valid async subplans. */
if (node->as_nasyncremain == 0)
return;
- /* Make a request for each of the valid async subplans. */
- i = -1;
- while ((i = bms_next_member(node->as_valid_asyncplans, i)) >= 0)
- {
- AsyncRequest *areq = node->as_asyncrequests[i];
-
- Assert(areq->request_index == i);
- Assert(!areq->callback_pending);
-
- /* Do the actual work. */
- ExecAsyncRequest(areq);
- }
+ ExecAppenderAsyncBegin(&node->as);
}
/* ----------------------------------------------------------------
@@ -961,7 +731,7 @@ ExecAppendAsyncGetNext(AppendState *node, TupleTableSlot **result)
if (node->as_syncdone)
{
Assert(node->as_nasyncremain == 0);
- *result = ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ *result = ExecClearTuple(node->as.ps.ps_ResultTupleSlot);
return true;
}
@@ -981,7 +751,7 @@ ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result)
int i;
/* Nothing to do if there are no async subplans needing a new request. */
- if (bms_is_empty(node->as_needrequest))
+ if (bms_is_empty(node->as.needrequest))
{
Assert(node->as_nasyncresults == 0);
return false;
@@ -994,17 +764,17 @@ ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result)
if (node->as_nasyncresults > 0)
{
--node->as_nasyncresults;
- *result = node->as_asyncresults[node->as_nasyncresults];
+ *result = node->as.asyncresults[node->as_nasyncresults];
return true;
}
/* Make a new request for each of the async subplans that need it. */
- needrequest = node->as_needrequest;
- node->as_needrequest = NULL;
+ needrequest = node->as.needrequest;
+ node->as.needrequest = NULL;
i = -1;
while ((i = bms_next_member(needrequest, i)) >= 0)
{
- AsyncRequest *areq = node->as_asyncrequests[i];
+ AsyncRequest *areq = node->as.asyncrequests[i];
/* Do the actual work. */
ExecAsyncRequest(areq);
@@ -1015,7 +785,7 @@ ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result)
if (node->as_nasyncresults > 0)
{
--node->as_nasyncresults;
- *result = node->as_asyncresults[node->as_nasyncresults];
+ *result = node->as.asyncresults[node->as_nasyncresults];
return true;
}
@@ -1031,105 +801,12 @@ ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result)
static void
ExecAppendAsyncEventWait(AppendState *node)
{
- int nevents = node->as_nasyncplans + 2;
long timeout = node->as_syncdone ? -1 : 0;
- WaitEvent occurred_event[EVENT_BUFFER_SIZE];
- int noccurred;
- int i;
/* We should never be called when there are no valid async subplans. */
Assert(node->as_nasyncremain > 0);
- Assert(node->as_eventset == NULL);
- node->as_eventset = CreateWaitEventSet(CurrentResourceOwner, nevents);
- AddWaitEventToSet(node->as_eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
- NULL, NULL);
-
- /* Give each waiting subplan a chance to add an event. */
- i = -1;
- while ((i = bms_next_member(node->as_asyncplans, i)) >= 0)
- {
- AsyncRequest *areq = node->as_asyncrequests[i];
-
- if (areq->callback_pending)
- ExecAsyncConfigureWait(areq);
- }
-
- /*
- * No need for further processing if none of the subplans configured any
- * events.
- */
- if (GetNumRegisteredWaitEvents(node->as_eventset) == 1)
- {
- FreeWaitEventSet(node->as_eventset);
- node->as_eventset = NULL;
- return;
- }
-
- /*
- * Add the process latch to the set, so that we wake up to process the
- * standard interrupts with CHECK_FOR_INTERRUPTS().
- *
- * NOTE: For historical reasons, it's important that this is added to the
- * WaitEventSet after the ExecAsyncConfigureWait() calls. Namely,
- * postgres_fdw calls "GetNumRegisteredWaitEvents(set) == 1" to check if
- * any other events are in the set. That's a poor design, it's
- * questionable for postgres_fdw to be doing that in the first place, but
- * we cannot change it now. The pattern has possibly been copied to other
- * extensions too.
- */
- AddWaitEventToSet(node->as_eventset, WL_LATCH_SET, PGINVALID_SOCKET,
- MyLatch, NULL);
-
- /* Return at most EVENT_BUFFER_SIZE events in one call. */
- if (nevents > EVENT_BUFFER_SIZE)
- nevents = EVENT_BUFFER_SIZE;
-
- /*
- * If the timeout is -1, wait until at least one event occurs. If the
- * timeout is 0, poll for events, but do not wait at all.
- */
- noccurred = WaitEventSetWait(node->as_eventset, timeout, occurred_event,
- nevents, WAIT_EVENT_APPEND_READY);
- FreeWaitEventSet(node->as_eventset);
- node->as_eventset = NULL;
- if (noccurred == 0)
- return;
-
- /* Deliver notifications. */
- for (i = 0; i < noccurred; i++)
- {
- WaitEvent *w = &occurred_event[i];
-
- /*
- * Each waiting subplan should have registered its wait event with
- * user_data pointing back to its AsyncRequest.
- */
- if ((w->events & WL_SOCKET_READABLE) != 0)
- {
- AsyncRequest *areq = (AsyncRequest *) w->user_data;
-
- if (areq->callback_pending)
- {
- /*
- * Mark it as no longer needing a callback. We must do this
- * before dispatching the callback in case the callback resets
- * the flag.
- */
- areq->callback_pending = false;
-
- /* Do the actual work. */
- ExecAsyncNotify(areq);
- }
- }
-
- /* Handle standard interrupts */
- if ((w->events & WL_LATCH_SET) != 0)
- {
- ResetLatch(MyLatch);
- CHECK_FOR_INTERRUPTS();
- }
- }
+ ExecAppenderAsyncEventWait(&node->as, timeout, WAIT_EVENT_APPEND_READY);
}
/* ----------------------------------------------------------------
@@ -1165,14 +842,14 @@ ExecAsyncAppendResponse(AsyncRequest *areq)
}
/* Save result so we can return it. */
- Assert(node->as_nasyncresults < node->as_nasyncplans);
- node->as_asyncresults[node->as_nasyncresults++] = slot;
+ Assert(node->as_nasyncresults < node->as.nasyncplans);
+ node->as.asyncresults[node->as_nasyncresults++] = slot;
/*
* Mark the subplan that returned a result as ready for a new request. We
* don't launch another one here immediately because it might complete.
*/
- node->as_needrequest = bms_add_member(node->as_needrequest,
+ node->as.needrequest = bms_add_member(node->as.needrequest,
areq->request_index);
}
@@ -1187,10 +864,10 @@ ExecAsyncAppendResponse(AsyncRequest *areq)
static void
classify_matching_subplans(AppendState *node)
{
- Assert(node->as_valid_subplans_identified);
+ Assert(node->as.valid_subplans_identified);
/* Nothing to do if there are no valid subplans. */
- if (bms_is_empty(node->as_valid_subplans))
+ if (bms_is_empty(node->as.valid_subplans))
{
node->as_syncdone = true;
node->as_nasyncremain = 0;
@@ -1199,8 +876,8 @@ classify_matching_subplans(AppendState *node)
/* No valid async subplans identified. */
if (!classify_matching_subplans_common(
- &node->as_valid_subplans,
- node->as_asyncplans,
- &node->as_valid_asyncplans))
+ &node->as.valid_subplans,
+ node->as.asyncplans,
+ &node->as.valid_asyncplans))
node->as_nasyncremain = 0;
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f1c267eb9eb..e1a207aeb85 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -38,6 +38,7 @@
#include "postgres.h"
+#include "executor/execAppend.h"
#include "executor/executor.h"
#include "executor/execAsync.h"
#include "executor/execPartition.h"
@@ -76,14 +77,7 @@ MergeAppendState *
ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
MergeAppendState *mergestate = makeNode(MergeAppendState);
- PlanState **mergeplanstates;
- const TupleTableSlotOps *mergeops;
- Bitmapset *validsubplans;
- int nplans;
- int i,
- j;
- Bitmapset *asyncplans;
- int nasyncplans;
+ int i;
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -91,154 +85,27 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/*
* create new MergeAppendState for our node
*/
- mergestate->ps.plan = (Plan *) node;
- mergestate->ps.state = estate;
- mergestate->ps.ExecProcNode = ExecMergeAppend;
-
- /* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_index >= 0)
- {
- PartitionPruneState *prunestate;
-
- /*
- * Set up pruning data structure. This also initializes the set of
- * subplans to initialize (validsubplans) by taking into account the
- * result of performing initial pruning if any.
- */
- prunestate = ExecInitPartitionExecPruning(&mergestate->ps,
- list_length(node->mergeplans),
- node->part_prune_index,
- node->apprelids,
- &validsubplans);
- mergestate->ms_prune_state = prunestate;
- nplans = bms_num_members(validsubplans);
-
- /*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill ms_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
- */
- if (!prunestate->do_exec_prune && nplans > 0)
- {
- mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
- mergestate->ms_valid_subplans_identified = true;
- }
- }
- else
- {
- nplans = list_length(node->mergeplans);
-
- /*
- * When run-time partition pruning is not enabled we can just mark all
- * subplans as valid; they must also all be initialized.
- */
- Assert(nplans > 0);
- mergestate->ms_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
- mergestate->ms_valid_subplans_identified = true;
- mergestate->ms_prune_state = NULL;
- }
-
- mergeplanstates = palloc_array(PlanState *, nplans);
- mergestate->mergeplans = mergeplanstates;
- mergestate->ms_nplans = nplans;
-
- mergestate->ms_slots = palloc0_array(TupleTableSlot *, nplans);
- mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots,
+ mergestate->ms.ps.plan = (Plan *) node;
+ mergestate->ms.ps.state = estate;
+ mergestate->ms.ps.ExecProcNode = ExecMergeAppend;
+
+ /* Initialize common fields */
+ ExecInitAppender(&mergestate->ms,
+ &node->ap,
+ estate,
+ eflags,
+ -1,
+ NULL);
+
+ if (mergestate->ms.nasyncplans > 0 && mergestate->ms.valid_subplans_identified)
+ classify_matching_subplans(mergestate);
+
+ mergestate->ms_slots = palloc0_array(TupleTableSlot *, mergestate->ms.nplans);
+ mergestate->ms_heap = binaryheap_allocate(mergestate->ms.nplans, heap_compare_slots,
mergestate);
- /*
- * call ExecInitNode on each of the valid plans to be executed and save
- * the results into the mergeplanstates array.
- */
- j = 0;
- asyncplans = NULL;
- nasyncplans = 0;
-
- i = -1;
- while ((i = bms_next_member(validsubplans, i)) >= 0)
- {
- Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
-
- /*
- * Record async subplans. When executing EvalPlanQual, we treat them
- * as sync ones; don't do this when initializing an EvalPlanQual plan
- * tree.
- */
- if (initNode->async_capable && estate->es_epq_active == NULL)
- {
- asyncplans = bms_add_member(asyncplans, j);
- nasyncplans++;
- }
-
- mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
- }
-
- /*
- * Initialize MergeAppend's result tuple type and slot. If the child
- * plans all produce the same fixed slot type, we can use that slot type;
- * otherwise make a virtual slot. (Note that the result slot itself is
- * used only to return a null tuple at end of execution; real tuples are
- * returned to the caller in the children's own result slots. What we are
- * doing here is allowing the parent plan node to optimize if the
- * MergeAppend will return only one kind of slot.)
- */
- mergeops = ExecGetCommonSlotOps(mergeplanstates, j);
- if (mergeops != NULL)
- {
- ExecInitResultTupleSlotTL(&mergestate->ps, mergeops);
- }
- else
- {
- ExecInitResultTupleSlotTL(&mergestate->ps, &TTSOpsVirtual);
- /* show that the output slot type is not fixed */
- mergestate->ps.resultopsset = true;
- mergestate->ps.resultopsfixed = false;
- }
-
- /*
- * Miscellaneous initialization
- */
- mergestate->ps.ps_ProjInfo = NULL;
-
- /* Initialize async state */
- mergestate->ms_asyncplans = asyncplans;
- mergestate->ms_nasyncplans = nasyncplans;
- mergestate->ms_asyncrequests = NULL;
- mergestate->ms_asyncresults = NULL;
mergestate->ms_has_asyncresults = NULL;
mergestate->ms_asyncremain = NULL;
- mergestate->ms_needrequest = NULL;
- mergestate->ms_eventset = NULL;
- mergestate->ms_valid_asyncplans = NULL;
-
- if (nasyncplans > 0)
- {
- mergestate->ms_asyncrequests = (AsyncRequest **)
- palloc0(nplans * sizeof(AsyncRequest *));
-
- i = -1;
- while ((i = bms_next_member(asyncplans, i)) >= 0)
- {
- AsyncRequest *areq;
-
- areq = palloc(sizeof(AsyncRequest));
- areq->requestor = (PlanState *) mergestate;
- areq->requestee = mergeplanstates[i];
- areq->request_index = i;
- areq->callback_pending = false;
- areq->request_complete = false;
- areq->result = NULL;
-
- mergestate->ms_asyncrequests[i] = areq;
- }
-
- mergestate->ms_asyncresults = (TupleTableSlot **)
- palloc0(nplans * sizeof(TupleTableSlot *));
-
- if (mergestate->ms_valid_subplans_identified)
- classify_matching_subplans(mergestate);
- }
/*
* initialize sort-key information
@@ -293,20 +160,20 @@ ExecMergeAppend(PlanState *pstate)
if (!node->ms_initialized)
{
/* Nothing to do if all subplans were pruned */
- if (node->ms_nplans == 0)
- return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ms.nplans == 0)
+ return ExecClearTuple(node->ms.ps.ps_ResultTupleSlot);
/* If we've yet to determine the valid subplans then do so now. */
- if (!node->ms_valid_subplans_identified)
+ if (!node->ms.valid_subplans_identified)
{
- node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
- node->ms_valid_subplans_identified = true;
+ node->ms.valid_subplans =
+ ExecFindMatchingSubPlans(node->ms.prune_state, false, NULL);
+ node->ms.valid_subplans_identified = true;
classify_matching_subplans(node);
}
/* If there are any async subplans, begin executing them. */
- if (node->ms_nasyncplans > 0)
+ if (node->ms.nasyncplans > 0)
ExecMergeAppendAsyncBegin(node);
/*
@@ -314,16 +181,16 @@ ExecMergeAppend(PlanState *pstate)
* and set up the heap.
*/
i = -1;
- while ((i = bms_next_member(node->ms_valid_subplans, i)) >= 0)
+ while ((i = bms_next_member(node->ms.valid_subplans, i)) >= 0)
{
- node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
+ node->ms_slots[i] = ExecProcNode(node->ms.plans[i]);
if (!TupIsNull(node->ms_slots[i]))
binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
}
/* Look at valid async subplans */
i = -1;
- while ((i = bms_next_member(node->ms_valid_asyncplans, i)) >= 0)
+ while ((i = bms_next_member(node->ms.valid_asyncplans, i)) >= 0)
{
ExecMergeAppendAsyncGetNext(node, i);
if (!TupIsNull(node->ms_slots[i]))
@@ -344,12 +211,12 @@ ExecMergeAppend(PlanState *pstate)
* to not pull tuples until necessary.)
*/
i = DatumGetInt32(binaryheap_first(node->ms_heap));
- if (bms_is_member(i, node->ms_asyncplans))
+ if (bms_is_member(i, node->ms.asyncplans))
ExecMergeAppendAsyncGetNext(node, i);
else
{
- Assert(bms_is_member(i, node->ms_valid_subplans));
- node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
+ Assert(bms_is_member(i, node->ms.valid_subplans));
+ node->ms_slots[i] = ExecProcNode(node->ms.plans[i]);
}
if (!TupIsNull(node->ms_slots[i]))
binaryheap_replace_first(node->ms_heap, Int32GetDatum(i));
@@ -360,7 +227,7 @@ ExecMergeAppend(PlanState *pstate)
if (binaryheap_empty(node->ms_heap))
{
/* All the subplans are exhausted, and so is the heap */
- result = ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ result = ExecClearTuple(node->ms.ps.ps_ResultTupleSlot);
}
else
{
@@ -426,81 +293,21 @@ heap_compare_slots(Datum a, Datum b, void *arg)
void
ExecEndMergeAppend(MergeAppendState *node)
{
- PlanState **mergeplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- mergeplans = node->mergeplans;
- nplans = node->ms_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(mergeplans[i]);
+ ExecEndAppender(&node->ms);
}
void
ExecReScanMergeAppend(MergeAppendState *node)
{
- int i;
- int nasyncplans = node->ms_nasyncplans;
+ int nasyncplans = node->ms.nasyncplans;
- /*
- * If any PARAM_EXEC Params used in pruning expressions have changed, then
- * we'd better unset the valid subplans so that they are reselected for
- * the new parameter values.
- */
- if (node->ms_prune_state &&
- bms_overlap(node->ps.chgParam,
- node->ms_prune_state->execparamids))
- {
- node->ms_valid_subplans_identified = false;
- bms_free(node->ms_valid_subplans);
- node->ms_valid_subplans = NULL;
- bms_free(node->ms_valid_asyncplans);
- node->ms_valid_asyncplans = NULL;
- }
-
- for (i = 0; i < node->ms_nplans; i++)
- {
- PlanState *subnode = node->mergeplans[i];
-
- /*
- * ExecReScan doesn't know about my subplans, so I have to do
- * changed-parameter signaling myself.
- */
- if (node->ps.chgParam != NULL)
- UpdateChangedParamSet(subnode, node->ps.chgParam);
-
- /*
- * If chgParam of subnode is not null then plan will be re-scanned by
- * first ExecProcNode.
- */
- if (subnode->chgParam == NULL)
- ExecReScan(subnode);
- }
+ ExecReScanAppender(&node->ms);
- /* Reset async state */
+ /* Reset specific merge append async state */
if (nasyncplans > 0)
{
- i = -1;
- while ((i = bms_next_member(node->ms_asyncplans, i)) >= 0)
- {
- AsyncRequest *areq = node->ms_asyncrequests[i];
-
- areq->callback_pending = false;
- areq->request_complete = false;
- areq->result = NULL;
- }
-
bms_free(node->ms_asyncremain);
node->ms_asyncremain = NULL;
- bms_free(node->ms_needrequest);
- node->ms_needrequest = NULL;
bms_free(node->ms_has_asyncresults);
node->ms_has_asyncresults = NULL;
}
@@ -519,10 +326,10 @@ ExecReScanMergeAppend(MergeAppendState *node)
static void
classify_matching_subplans(MergeAppendState *node)
{
- Assert(node->ms_valid_subplans_identified);
+ Assert(node->ms.valid_subplans_identified);
/* Nothing to do if there are no valid subplans. */
- if (bms_is_empty(node->ms_valid_subplans))
+ if (bms_is_empty(node->ms.valid_subplans))
{
node->ms_asyncremain = NULL;
return;
@@ -530,9 +337,9 @@ classify_matching_subplans(MergeAppendState *node)
/* No valid async subplans identified. */
if (!classify_matching_subplans_common(
- &node->ms_valid_subplans,
- node->ms_asyncplans,
- &node->ms_valid_asyncplans))
+ &node->ms.valid_subplans,
+ node->ms.asyncplans,
+ &node->ms.valid_asyncplans))
node->ms_asyncremain = NULL;
}
@@ -545,39 +352,17 @@ classify_matching_subplans(MergeAppendState *node)
static void
ExecMergeAppendAsyncBegin(MergeAppendState *node)
{
- int i;
-
- /* Backward scan is not supported by async-aware MergeAppends. */
- Assert(ScanDirectionIsForward(node->ps.state->es_direction));
-
- /* We should never be called when there are no subplans */
- Assert(node->ms_nplans > 0);
-
- /* We should never be called when there are no async subplans. */
- Assert(node->ms_nasyncplans > 0);
-
/* ExecMergeAppend() identifies valid subplans */
- Assert(node->ms_valid_subplans_identified);
+ Assert(node->ms.valid_subplans_identified);
/* Initialize state variables. */
- node->ms_asyncremain = bms_copy(node->ms_valid_asyncplans);
+ node->ms_asyncremain = bms_copy(node->ms.valid_asyncplans);
/* Nothing to do if there are no valid async subplans. */
if (bms_is_empty(node->ms_asyncremain))
return;
- /* Make a request for each of the valid async subplans. */
- i = -1;
- while ((i = bms_next_member(node->ms_valid_asyncplans, i)) >= 0)
- {
- AsyncRequest *areq = node->ms_asyncrequests[i];
-
- Assert(areq->request_index == i);
- Assert(!areq->callback_pending);
-
- /* Do the actual work. */
- ExecAsyncRequest(areq);
- }
+ ExecAppenderAsyncBegin(&node->ms);
}
/* ----------------------------------------------------------------
@@ -638,7 +423,7 @@ ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan)
*/
if (bms_is_member(mplan, node->ms_has_asyncresults))
{
- node->ms_slots[mplan] = node->ms_asyncresults[mplan];
+ node->ms_slots[mplan] = node->ms.asyncresults[mplan];
return true;
}
@@ -648,7 +433,7 @@ ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan)
*/
needrequest = NULL;
i = -1;
- while ((i = bms_next_member(node->ms_needrequest, i)) >= 0)
+ while ((i = bms_next_member(node->ms.needrequest, i)) >= 0)
{
if (!bms_is_member(i, node->ms_has_asyncresults))
needrequest = bms_add_member(needrequest, i);
@@ -661,13 +446,13 @@ ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan)
return false;
/* Clear ms_needrequest flag, as we are going to send requests now */
- node->ms_needrequest = bms_del_members(node->ms_needrequest, needrequest);
+ node->ms.needrequest = bms_del_members(node->ms.needrequest, needrequest);
/* Make a new request for each of the async subplans that need it. */
i = -1;
while ((i = bms_next_member(needrequest, i)) >= 0)
{
- AsyncRequest *areq = node->ms_asyncrequests[i];
+ AsyncRequest *areq = node->ms.asyncrequests[i];
/*
* We've just checked that subplan doesn't already have some fetched
@@ -683,7 +468,7 @@ ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan)
/* Return needed asynchronously-generated results if any. */
if (bms_is_member(mplan, node->ms_has_asyncresults))
{
- node->ms_slots[mplan] = node->ms_asyncresults[mplan];
+ node->ms_slots[mplan] = node->ms.asyncresults[mplan];
return true;
}
@@ -707,7 +492,7 @@ ExecAsyncMergeAppendResponse(AsyncRequest *areq)
/* We should handle previous async result prior to getting new one */
Assert(!bms_is_member(areq->request_index, node->ms_has_asyncresults));
- node->ms_asyncresults[areq->request_index] = NULL;
+ node->ms.asyncresults[areq->request_index] = NULL;
/* Nothing to do if the request is pending. */
if (!areq->request_complete)
{
@@ -730,13 +515,13 @@ ExecAsyncMergeAppendResponse(AsyncRequest *areq)
node->ms_has_asyncresults = bms_add_member(node->ms_has_asyncresults,
areq->request_index);
/* Save result so we can return it. */
- node->ms_asyncresults[areq->request_index] = slot;
+ node->ms.asyncresults[areq->request_index] = slot;
/*
* Mark the subplan that returned a result as ready for a new request. We
* don't launch another one here immediately because it might complete.
*/
- node->ms_needrequest = bms_add_member(node->ms_needrequest,
+ node->ms.needrequest = bms_add_member(node->ms.needrequest,
areq->request_index);
}
@@ -749,101 +534,8 @@ ExecAsyncMergeAppendResponse(AsyncRequest *areq)
static void
ExecMergeAppendAsyncEventWait(MergeAppendState *node)
{
- int nevents = node->ms_nasyncplans + 2; /* one for PM death and
- * one for latch */
- WaitEvent occurred_event[EVENT_BUFFER_SIZE];
- int noccurred;
- int i;
-
/* We should never be called when there are no valid async subplans. */
Assert(bms_num_members(node->ms_asyncremain) > 0);
- node->ms_eventset = CreateWaitEventSet(CurrentResourceOwner, nevents);
- AddWaitEventToSet(node->ms_eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
- NULL, NULL);
-
- /* Give each waiting subplan a chance to add an event. */
- i = -1;
- while ((i = bms_next_member(node->ms_asyncplans, i)) >= 0)
- {
- AsyncRequest *areq = node->ms_asyncrequests[i];
-
- if (areq->callback_pending)
- ExecAsyncConfigureWait(areq);
- }
-
- /*
- * No need for further processing if none of the subplans configured any
- * events.
- */
- if (GetNumRegisteredWaitEvents(node->ms_eventset) == 1)
- {
- FreeWaitEventSet(node->ms_eventset);
- node->ms_eventset = NULL;
- return;
- }
-
- /*
- * Add the process latch to the set, so that we wake up to process the
- * standard interrupts with CHECK_FOR_INTERRUPTS().
- *
- * NOTE: For historical reasons, it's important that this is added to the
- * WaitEventSet after the ExecAsyncConfigureWait() calls. Namely,
- * postgres_fdw calls "GetNumRegisteredWaitEvents(set) == 1" to check if
- * any other events are in the set. That's a poor design, it's
- * questionable for postgres_fdw to be doing that in the first place, but
- * we cannot change it now. The pattern has possibly been copied to other
- * extensions too.
- */
- AddWaitEventToSet(node->ms_eventset, WL_LATCH_SET, PGINVALID_SOCKET,
- MyLatch, NULL);
-
- /* Return at most EVENT_BUFFER_SIZE events in one call. */
- if (nevents > EVENT_BUFFER_SIZE)
- nevents = EVENT_BUFFER_SIZE;
-
- /*
- * Wait until at least one event occurs.
- */
- noccurred = WaitEventSetWait(node->ms_eventset, -1 /* no timeout */ , occurred_event,
- nevents, WAIT_EVENT_APPEND_READY);
- FreeWaitEventSet(node->ms_eventset);
- node->ms_eventset = NULL;
- if (noccurred == 0)
- return;
-
- /* Deliver notifications. */
- for (i = 0; i < noccurred; i++)
- {
- WaitEvent *w = &occurred_event[i];
-
- /*
- * Each waiting subplan should have registered its wait event with
- * user_data pointing back to its AsyncRequest.
- */
- if ((w->events & WL_SOCKET_READABLE) != 0)
- {
- AsyncRequest *areq = (AsyncRequest *) w->user_data;
-
- if (areq->callback_pending)
- {
- /*
- * Mark it as no longer needing a callback. We must do this
- * before dispatching the callback in case the callback resets
- * the flag.
- */
- areq->callback_pending = false;
-
- /* Do the actual work. */
- ExecAsyncNotify(areq);
- }
- }
-
- /* Handle standard interrupts */
- if ((w->events & WL_LATCH_SET) != 0)
- {
- ResetLatch(MyLatch);
- CHECK_FOR_INTERRUPTS();
- }
- }
+ ExecAppenderAsyncEventWait(&node->ms, -1 /* no timeout */ , WAIT_EVENT_APPEND_READY);
}
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 024a2b2fd84..2f4e2ae6d39 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -4751,14 +4751,14 @@ planstate_tree_walker_impl(PlanState *planstate,
switch (nodeTag(plan))
{
case T_Append:
- if (planstate_walk_members(((AppendState *) planstate)->appendplans,
- ((AppendState *) planstate)->as_nplans,
+ if (planstate_walk_members(((AppendState *) planstate)->as.plans,
+ ((AppendState *) planstate)->as.nplans,
walker, context))
return true;
break;
case T_MergeAppend:
- if (planstate_walk_members(((MergeAppendState *) planstate)->mergeplans,
- ((MergeAppendState *) planstate)->ms_nplans,
+ if (planstate_walk_members(((MergeAppendState *) planstate)->ms.plans,
+ ((MergeAppendState *) planstate)->ms.nplans,
walker, context))
return true;
break;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index fc0e5c62ea9..a7b7e3d45e0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1262,11 +1262,11 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
* child plans, to make cross-checking the sort info easier.
*/
plan = makeNode(Append);
- plan->plan.targetlist = tlist;
- plan->plan.qual = NIL;
- plan->plan.lefttree = NULL;
- plan->plan.righttree = NULL;
- plan->apprelids = rel->relids;
+ plan->ap.plan.targetlist = tlist;
+ plan->ap.plan.qual = NIL;
+ plan->ap.plan.lefttree = NULL;
+ plan->ap.plan.righttree = NULL;
+ plan->ap.apprelids = rel->relids;
if (pathkeys != NIL)
{
@@ -1285,7 +1285,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
&nodeSortOperators,
&nodeCollations,
&nodeNullsFirst);
- tlist_was_changed = (orig_tlist_length != list_length(plan->plan.targetlist));
+ tlist_was_changed = (orig_tlist_length != list_length(plan->ap.plan.targetlist));
}
/* If appropriate, consider async append */
@@ -1395,7 +1395,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
/* Set below if we find quals that we can use to run-time prune */
- plan->part_prune_index = -1;
+ plan->ap.part_prune_index = -1;
/*
* If any quals exist, they may be useful to perform further partition
@@ -1420,16 +1420,16 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ plan->ap.part_prune_index = make_partition_pruneinfo(root, rel,
best_path->subpaths,
prunequal);
}
- plan->appendplans = subplans;
+ plan->ap.subplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(&plan->ap.plan, (Path *) best_path);
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
@@ -1438,9 +1438,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
*/
if (tlist_was_changed && (flags & (CP_EXACT_TLIST | CP_SMALL_TLIST)))
{
- tlist = list_copy_head(plan->plan.targetlist, orig_tlist_length);
+ tlist = list_copy_head(plan->ap.plan.targetlist, orig_tlist_length);
return inject_projection_plan((Plan *) plan, tlist,
- plan->plan.parallel_safe);
+ plan->ap.plan.parallel_safe);
}
else
return (Plan *) plan;
@@ -1458,7 +1458,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
int flags)
{
MergeAppend *node = makeNode(MergeAppend);
- Plan *plan = &node->plan;
+ Plan *plan = &node->ap.plan;
List *tlist = build_path_tlist(root, &best_path->path);
int orig_tlist_length = list_length(tlist);
bool tlist_was_changed;
@@ -1479,7 +1479,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
plan->qual = NIL;
plan->lefttree = NULL;
plan->righttree = NULL;
- node->apprelids = rel->relids;
+ node->ap.apprelids = rel->relids;
consider_async = (enable_async_merge_append &&
!best_path->path.parallel_safe &&
@@ -1593,7 +1593,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
/* Set below if we find quals that we can use to run-time prune */
- node->part_prune_index = -1;
+ node->ap.part_prune_index = -1;
/*
* If any quals exist, they may be useful to perform further partition
@@ -1610,12 +1610,12 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
Assert(best_path->path.param_info == NULL);
if (prunequal != NIL)
- node->part_prune_index = make_partition_pruneinfo(root, rel,
+ node->ap.part_prune_index = make_partition_pruneinfo(root, rel,
best_path->subpaths,
prunequal);
}
- node->mergeplans = subplans;
+ node->ap.subplans = subplans;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index cd7ea1e6b58..a595f34c87b 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1850,10 +1850,10 @@ set_append_references(PlannerInfo *root,
* check quals. If it's got exactly one child plan, then it's not doing
* anything useful at all, and we can strip it out.
*/
- Assert(aplan->plan.qual == NIL);
+ Assert(aplan->ap.plan.qual == NIL);
/* First, we gotta recurse on the children */
- foreach(l, aplan->appendplans)
+ foreach(l, aplan->ap.subplans)
{
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
@@ -1866,11 +1866,11 @@ set_append_references(PlannerInfo *root,
* plan may execute the non-parallel aware child multiple times. (If you
* change these rules, update create_append_path to match.)
*/
- if (list_length(aplan->appendplans) == 1)
+ if (list_length(aplan->ap.subplans) == 1)
{
- Plan *p = (Plan *) linitial(aplan->appendplans);
+ Plan *p = (Plan *) linitial(aplan->ap.subplans);
- if (p->parallel_aware == aplan->plan.parallel_aware)
+ if (p->parallel_aware == aplan->ap.plan.parallel_aware)
return clean_up_removed_plan_level((Plan *) aplan, p);
}
@@ -1881,19 +1881,19 @@ set_append_references(PlannerInfo *root,
*/
set_dummy_tlist_references((Plan *) aplan, rtoffset);
- aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ aplan->ap.apprelids = offset_relid_set(aplan->ap.apprelids, rtoffset);
/*
* Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
* Also update the RT indexes present in it to add the offset.
*/
- if (aplan->part_prune_index >= 0)
- aplan->part_prune_index =
- register_partpruneinfo(root, aplan->part_prune_index, rtoffset);
+ if (aplan->ap.part_prune_index >= 0)
+ aplan->ap.part_prune_index =
+ register_partpruneinfo(root, aplan->ap.part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
- Assert(aplan->plan.lefttree == NULL);
- Assert(aplan->plan.righttree == NULL);
+ Assert(aplan->ap.plan.lefttree == NULL);
+ Assert(aplan->ap.plan.righttree == NULL);
return (Plan *) aplan;
}
@@ -1917,10 +1917,10 @@ set_mergeappend_references(PlannerInfo *root,
* or check quals. If it's got exactly one child plan, then it's not
* doing anything useful at all, and we can strip it out.
*/
- Assert(mplan->plan.qual == NIL);
+ Assert(mplan->ap.plan.qual == NIL);
/* First, we gotta recurse on the children */
- foreach(l, mplan->mergeplans)
+ foreach(l, mplan->ap.subplans)
{
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
@@ -1934,11 +1934,11 @@ set_mergeappend_references(PlannerInfo *root,
* multiple times. (If you change these rules, update
* create_merge_append_path to match.)
*/
- if (list_length(mplan->mergeplans) == 1)
+ if (list_length(mplan->ap.subplans) == 1)
{
- Plan *p = (Plan *) linitial(mplan->mergeplans);
+ Plan *p = (Plan *) linitial(mplan->ap.subplans);
- if (p->parallel_aware == mplan->plan.parallel_aware)
+ if (p->parallel_aware == mplan->ap.plan.parallel_aware)
return clean_up_removed_plan_level((Plan *) mplan, p);
}
@@ -1949,19 +1949,19 @@ set_mergeappend_references(PlannerInfo *root,
*/
set_dummy_tlist_references((Plan *) mplan, rtoffset);
- mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ mplan->ap.apprelids = offset_relid_set(mplan->ap.apprelids, rtoffset);
/*
* Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
* Also update the RT indexes present in it to add the offset.
*/
- if (mplan->part_prune_index >= 0)
- mplan->part_prune_index =
- register_partpruneinfo(root, mplan->part_prune_index, rtoffset);
+ if (mplan->ap.part_prune_index >= 0)
+ mplan->ap.part_prune_index =
+ register_partpruneinfo(root, mplan->ap.part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
- Assert(mplan->plan.lefttree == NULL);
- Assert(mplan->plan.righttree == NULL);
+ Assert(mplan->ap.plan.lefttree == NULL);
+ Assert(mplan->ap.plan.righttree == NULL);
return (Plan *) mplan;
}
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index ff63d20f8d5..eb616c977bc 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2759,7 +2759,7 @@ finalize_plan(PlannerInfo *root, Plan *plan,
case T_Append:
{
- foreach(l, ((Append *) plan)->appendplans)
+ foreach(l, ((Append *) plan)->ap.subplans)
{
context.paramids =
bms_add_members(context.paramids,
@@ -2774,7 +2774,7 @@ finalize_plan(PlannerInfo *root, Plan *plan,
case T_MergeAppend:
{
- foreach(l, ((MergeAppend *) plan)->mergeplans)
+ foreach(l, ((MergeAppend *) plan)->ap.subplans)
{
context.paramids =
bms_add_members(context.paramids,
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 9f85eb86da1..ce57f80e5e3 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -5163,9 +5163,9 @@ set_deparse_plan(deparse_namespace *dpns, Plan *plan)
* natural choice.
*/
if (IsA(plan, Append))
- dpns->outer_plan = linitial(((Append *) plan)->appendplans);
+ dpns->outer_plan = linitial(((Append *) plan)->ap.subplans);
else if (IsA(plan, MergeAppend))
- dpns->outer_plan = linitial(((MergeAppend *) plan)->mergeplans);
+ dpns->outer_plan = linitial(((MergeAppend *) plan)->ap.subplans);
else
dpns->outer_plan = outerPlan(plan);
@@ -7955,10 +7955,10 @@ resolve_special_varno(Node *node, deparse_context *context,
if (IsA(dpns->plan, Append))
context->appendparents = bms_union(context->appendparents,
- ((Append *) dpns->plan)->apprelids);
+ ((Append *) dpns->plan)->ap.apprelids);
else if (IsA(dpns->plan, MergeAppend))
context->appendparents = bms_union(context->appendparents,
- ((MergeAppend *) dpns->plan)->apprelids);
+ ((MergeAppend *) dpns->plan)->ap.apprelids);
push_child_plan(dpns, dpns->outer_plan, &save_dpns);
resolve_special_varno((Node *) tle->expr, context,
diff --git a/src/include/executor/execAppend.h b/src/include/executor/execAppend.h
new file mode 100644
index 00000000000..b5509373524
--- /dev/null
+++ b/src/include/executor/execAppend.h
@@ -0,0 +1,60 @@
+/*-------------------------------------------------------------------------
+ * execAppend.h
+ * Support functions for MergeAppend and Append nodes.
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/include/executor/execAppend.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef EXECAPPEND_H
+#define EXECAPPEND_H
+
+#include "nodes/execnodes.h"
+
+void ExecInitAppender(AppenderState * state,
+ Appender * node,
+ EState *estate,
+ int eflags,
+ int first_partial_plan,
+ int *first_valid_partial_plan);
+
+void ExecEndAppender(AppenderState * node);
+
+void ExecReScanAppender(AppenderState * node);
+
+void ExecAppenderAsyncBegin(AppenderState * node);
+
+void ExecAppenderAsyncEventWait(AppenderState * node,
+ int timeout,
+ uint32 wait_event_info);
+
+/* Common part of classify_matching_subplans() for Append and MergeAppend */
+static inline bool
+classify_matching_subplans_common(Bitmapset **valid_subplans,
+ Bitmapset *asyncplans,
+ Bitmapset **valid_asyncplans)
+{
+ Assert(*valid_asyncplans == NULL);
+
+ /* Checked by classify_matching_subplans() */
+ Assert(!bms_is_empty(*valid_subplans));
+
+ /* Nothing to do if there are no valid async subplans. */
+ if (!bms_overlap(*valid_subplans, asyncplans))
+ return false;
+
+ /* Get valid async subplans. */
+ *valid_asyncplans = bms_intersect(asyncplans,
+ *valid_subplans);
+
+ /* Adjust the valid subplans to contain sync subplans only. */
+ *valid_subplans = bms_del_members(*valid_subplans,
+ *valid_asyncplans);
+ return true;
+}
+
+#endif /* EXECAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5887cbf4f16..501f2c3acf4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1472,6 +1472,27 @@ typedef struct ModifyTableState
List *mt_mergeJoinConditions;
} ModifyTableState;
+typedef struct AppenderState
+{
+ PlanState ps; /* its first field is NodeTag */
+ PlanState **plans; /* array of PlanStates for my inputs */
+ int nplans;
+
+ /* Asynchronous execution state */
+ Bitmapset *asyncplans; /* asynchronous plans indexes */
+ int nasyncplans; /* # of asynchronous plans */
+ AsyncRequest **asyncrequests; /* array of AsyncRequests */
+ TupleTableSlot **asyncresults; /* unreturned results of async plans */
+ Bitmapset *needrequest; /* asynchronous plans needing a new request */
+ struct WaitEventSet *eventset; /* WaitEventSet for file descriptor waits */
+
+ /* Partition pruning state */
+ struct PartitionPruneState *prune_state;
+ bool valid_subplans_identified;
+ Bitmapset *valid_subplans;
+ Bitmapset *valid_asyncplans; /* valid asynchronous plans indexes */
+} AppenderState;
+
/* ----------------
* AppendState information
*
@@ -1493,31 +1514,20 @@ struct PartitionPruneState;
struct AppendState
{
- PlanState ps; /* its first field is NodeTag */
- PlanState **appendplans; /* array of PlanStates for my inputs */
- int as_nplans;
+ AppenderState as;
+
int as_whichplan;
bool as_begun; /* false means need to initialize */
- Bitmapset *as_asyncplans; /* asynchronous plans indexes */
- int as_nasyncplans; /* # of asynchronous plans */
- AsyncRequest **as_asyncrequests; /* array of AsyncRequests */
- TupleTableSlot **as_asyncresults; /* unreturned results of async plans */
- int as_nasyncresults; /* # of valid entries in as_asyncresults */
- bool as_syncdone; /* true if all synchronous plans done in
- * asynchronous mode, else false */
+ int as_nasyncresults; /* # of valid entries in asyncresults */
+ bool as_syncdone; /* all sync plans done in async mode? */
int as_nasyncremain; /* # of remaining asynchronous plans */
- Bitmapset *as_needrequest; /* asynchronous plans needing a new request */
- struct WaitEventSet *as_eventset; /* WaitEventSet used to configure file
- * descriptor wait events */
- int as_first_partial_plan; /* Index of 'appendplans' containing
- * the first partial plan */
- ParallelAppendState *as_pstate; /* parallel coordination info */
- Size pstate_len; /* size of parallel coordination info */
- struct PartitionPruneState *as_prune_state;
- bool as_valid_subplans_identified; /* is as_valid_subplans valid? */
- Bitmapset *as_valid_subplans;
- Bitmapset *as_valid_asyncplans; /* valid asynchronous plans indexes */
- bool (*choose_next_subplan) (AppendState *);
+ int as_first_partial_plan;
+
+ /* Parallel append specific */
+ ParallelAppendState *as_pstate;
+ Size pstate_len;
+
+ bool (*choose_next_subplan) (struct AppendState *);
};
/* ----------------
@@ -1537,27 +1547,17 @@ struct AppendState
*/
typedef struct MergeAppendState
{
- PlanState ps; /* its first field is NodeTag */
- PlanState **mergeplans; /* array of PlanStates for my inputs */
- int ms_nplans;
+ AppenderState ms;
+
int ms_nkeys;
SortSupport ms_sortkeys; /* array of length ms_nkeys */
TupleTableSlot **ms_slots; /* array of length ms_nplans */
struct binaryheap *ms_heap; /* binary heap of slot indices */
bool ms_initialized; /* are subplans started? */
- Bitmapset *ms_asyncplans; /* asynchronous plans indexes */
- int ms_nasyncplans; /* # of asynchronous plans */
- AsyncRequest **ms_asyncrequests; /* array of AsyncRequests */
- TupleTableSlot **ms_asyncresults; /* unreturned results of async plans */
+
+ /* Merge-specific async tracking */
Bitmapset *ms_has_asyncresults; /* plans which have async results */
Bitmapset *ms_asyncremain; /* remaining asynchronous plans */
- Bitmapset *ms_needrequest; /* asynchronous plans needing a new request */
- struct WaitEventSet *ms_eventset; /* WaitEventSet used to configure file
- * descriptor wait events */
- struct PartitionPruneState *ms_prune_state;
- bool ms_valid_subplans_identified; /* is ms_valid_subplans valid? */
- Bitmapset *ms_valid_subplans;
- Bitmapset *ms_valid_asyncplans; /* valid asynchronous plans indexes */
} MergeAppendState;
/* Getters for AppendState and MergeAppendState */
@@ -1567,9 +1567,9 @@ GetAppendEventSet(PlanState *ps)
Assert(IsA(ps, AppendState) || IsA(ps, MergeAppendState));
if (IsA(ps, AppendState))
- return ((AppendState *) ps)->as_eventset;
+ return ((AppendState *) ps)->as.eventset;
else
- return ((MergeAppendState *) ps)->ms_eventset;
+ return ((MergeAppendState *) ps)->ms.eventset;
}
static inline Bitmapset *
@@ -1578,34 +1578,9 @@ GetNeedRequest(PlanState *ps)
Assert(IsA(ps, AppendState) || IsA(ps, MergeAppendState));
if (IsA(ps, AppendState))
- return ((AppendState *) ps)->as_needrequest;
+ return ((AppendState *) ps)->as.needrequest;
else
- return ((MergeAppendState *) ps)->ms_needrequest;
-}
-
-/* Common part of classify_matching_subplans() for Append and MergeAppend */
-static inline bool
-classify_matching_subplans_common(Bitmapset **valid_subplans,
- Bitmapset *asyncplans,
- Bitmapset **valid_asyncplans)
-{
- Assert(*valid_asyncplans == NULL);
-
- /* Checked by classify_matching_subplans() */
- Assert(!bms_is_empty(*valid_subplans));
-
- /* Nothing to do if there are no valid async subplans. */
- if (!bms_overlap(*valid_subplans, asyncplans))
- return false;
-
- /* Get valid async subplans. */
- *valid_asyncplans = bms_intersect(asyncplans,
- *valid_subplans);
-
- /* Adjust the valid subplans to contain sync subplans only. */
- *valid_subplans = bms_del_members(*valid_subplans,
- *valid_asyncplans);
- return true;
+ return ((MergeAppendState *) ps)->ms.needrequest;
}
/* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..30c20e80b40 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -380,6 +380,20 @@ typedef struct ModifyTable
struct PartitionPruneInfo; /* forward reference to struct below */
+typedef struct Appender
+{
+ Plan plan; /* its first field is NodeTag */
+ Bitmapset *apprelids; /* RTIs of appendrel(s) formed by this node */
+ List *subplans; /* List of Plans (formerly
+ * appendplans/mergeplans) */
+
+ /*
+ * Index into PlannedStmt.partPruneInfos and parallel lists in EState. Set
+ * to -1 if no run-time pruning is used.
+ */
+ int part_prune_index;
+} Appender;
+
/* ----------------
* Append node -
* Generate the concatenation of the results of sub-plans.
@@ -387,25 +401,16 @@ struct PartitionPruneInfo; /* forward reference to struct below */
*/
typedef struct Append
{
- Plan plan;
- /* RTIs of appendrel(s) formed by this node */
- Bitmapset *apprelids;
- List *appendplans;
+ Appender ap;
+
/* # of asynchronous plans */
int nasyncplans;
/*
- * All 'appendplans' preceding this index are non-partial plans. All
- * 'appendplans' from this index onwards are partial plans.
+ * All 'subplans' preceding this index are non-partial plans. All
+ * 'subplans' from this index onwards are partial plans.
*/
int first_partial_plan;
-
- /*
- * Index into PlannedStmt.partPruneInfos and parallel lists in EState:
- * es_part_prune_states and es_part_prune_results. Set to -1 if no
- * run-time pruning is used.
- */
- int part_prune_index;
} Append;
/* ----------------
@@ -415,12 +420,7 @@ typedef struct Append
*/
typedef struct MergeAppend
{
- Plan plan;
-
- /* RTIs of appendrel(s) formed by this node */
- Bitmapset *apprelids;
-
- List *mergeplans;
+ Appender ap;
/* these fields are just like the sort-key info in struct Sort: */
@@ -438,13 +438,6 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
-
- /*
- * Index into PlannedStmt.partPruneInfos and parallel lists in EState:
- * es_part_prune_states and es_part_prune_results. Set to -1 if no
- * run-time pruning is used.
- */
- int part_prune_index;
} MergeAppend;
/* ----------------
--
2.51.2
Attachments:
[text/plain] v12-0001-mark_async_capable-subpath-should-match-subplan.patch (2.4K, 2-v12-0001-mark_async_capable-subpath-should-match-subplan.patch)
download | inline diff:
From 14a51e23a506dc7d412dc7130c26b679fed520b9 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov <[email protected]>
Date: Sat, 15 Nov 2025 10:16:25 +0300
Subject: [PATCH v12 1/3] mark_async_capable(): subpath should match subplan
mark_async_capable() believes that path corresponds to plan. This is
not true when create_[merge_]append_plan() inserts sort node. In
this case mark_async_capable() can treat Sort plan node as some
other and crash. Fix this by handling the Sort node separately.
This is needed to make MergeAppend node async-capable that will
be implemented in a next commit.
---
src/backend/optimizer/plan/createplan.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index f1a01cfc544..05fac693989 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1139,10 +1139,12 @@ mark_async_capable_plan(Plan *plan, Path *path)
SubqueryScan *scan_plan = (SubqueryScan *) plan;
/*
- * If the generated plan node includes a gating Result node,
- * we can't execute it asynchronously.
+ * Check that plan is really a SubqueryScan before using it.
+ * It can be not true, if the generated plan node includes a
+ * gating Result node or a Sort node. In such case we can't
+ * execute it asynchronously.
*/
- if (IsA(plan, Result))
+ if (!IsA(plan, SubqueryScan))
return false;
/*
@@ -1160,10 +1162,10 @@ mark_async_capable_plan(Plan *plan, Path *path)
FdwRoutine *fdwroutine = path->parent->fdwroutine;
/*
- * If the generated plan node includes a gating Result node,
- * we can't execute it asynchronously.
+ * If the generated plan node includes a gating Result node or
+ * a Sort node, we can't execute it asynchronously.
*/
- if (IsA(plan, Result))
+ if (IsA(plan, Result) || IsA(plan, Sort))
return false;
Assert(fdwroutine != NULL);
@@ -1176,9 +1178,9 @@ mark_async_capable_plan(Plan *plan, Path *path)
/*
* If the generated plan node includes a Result node for the
- * projection, we can't execute it asynchronously.
+ * projection or a Sort node, we can't execute it asynchronously.
*/
- if (IsA(plan, Result))
+ if (IsA(plan, Result) || IsA(plan, Sort))
return false;
/*
--
2.51.2
[text/plain] v12-0002-MergeAppend-should-support-Async-Foreign-Scan-su.patch (50.3K, 3-v12-0002-MergeAppend-should-support-Async-Foreign-Scan-su.patch)
download | inline diff:
From c13180f186a924d3ec6987ea8c14eeb61bab10d7 Mon Sep 17 00:00:00 2001
From: Alexander Pyhalov <[email protected]>
Date: Sat, 15 Nov 2025 10:23:47 +0300
Subject: [PATCH v12 2/3] MergeAppend should support Async Foreign Scan
subplans
---
.../postgres_fdw/expected/postgres_fdw.out | 288 +++++++++++
contrib/postgres_fdw/postgres_fdw.c | 10 +-
contrib/postgres_fdw/sql/postgres_fdw.sql | 87 ++++
doc/src/sgml/config.sgml | 14 +
src/backend/executor/execAsync.c | 4 +
src/backend/executor/nodeAppend.c | 24 +-
src/backend/executor/nodeMergeAppend.c | 471 +++++++++++++++++-
src/backend/optimizer/path/costsize.c | 1 +
src/backend/optimizer/plan/createplan.c | 9 +
src/backend/utils/misc/guc_parameters.dat | 8 +
src/backend/utils/misc/postgresql.conf.sample | 1 +
src/include/executor/nodeMergeAppend.h | 1 +
src/include/nodes/execnodes.h | 59 +++
src/include/optimizer/cost.h | 1 +
src/test/regress/expected/sysviews.out | 3 +-
15 files changed, 951 insertions(+), 30 deletions(-)
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 6066510c7c0..1b35990b4c0 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -11556,6 +11556,46 @@ SELECT * FROM result_tbl ORDER BY a;
(2 rows)
DELETE FROM result_tbl;
+-- Test Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------
+ Merge Append
+ Sort Key: async_pt.b, async_pt.a
+ -> Async Foreign Scan on public.async_p1 async_pt_1
+ Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
+ Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b % 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+ -> Async Foreign Scan on public.async_p2 async_pt_2
+ Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
+ Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b % 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+(8 rows)
+
+SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
+ a | b | c
+------+-----+------
+ 1000 | 0 | 0000
+ 2000 | 0 | 0000
+ 1100 | 100 | 0100
+ 2100 | 100 | 0100
+ 1200 | 200 | 0200
+ 2200 | 200 | 0200
+ 1300 | 300 | 0300
+ 2300 | 300 | 0300
+ 1400 | 400 | 0400
+ 2400 | 400 | 0400
+ 1500 | 500 | 0500
+ 2500 | 500 | 0500
+ 1600 | 600 | 0600
+ 2600 | 600 | 0600
+ 1700 | 700 | 0700
+ 2700 | 700 | 0700
+ 1800 | 800 | 0800
+ 2800 | 800 | 0800
+ 1900 | 900 | 0900
+ 2900 | 900 | 0900
+(20 rows)
+
-- Test error handling, if accessing one of the foreign partitions errors out
CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (10000) TO (10001)
SERVER loopback OPTIONS (table_name 'non_existent_table');
@@ -11604,6 +11644,76 @@ COPY async_pt TO stdout; --error
ERROR: cannot copy from foreign table "async_p1"
DETAIL: Partition "async_p1" is a foreign table in partitioned table "async_pt"
HINT: Try the COPY (SELECT ...) TO variant.
+-- Test Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Merge Append
+ Sort Key: async_pt.b, async_pt.a
+ -> Async Foreign Scan on public.async_p1 async_pt_1
+ Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
+ Filter: (async_pt_1.b === 505)
+ Remote SQL: SELECT a, b, c FROM public.base_tbl1 ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+ -> Async Foreign Scan on public.async_p2 async_pt_2
+ Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
+ Filter: (async_pt_2.b === 505)
+ Remote SQL: SELECT a, b, c FROM public.base_tbl2 ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+ -> Async Foreign Scan on public.async_p3 async_pt_3
+ Output: async_pt_3.a, async_pt_3.b, async_pt_3.c
+ Filter: (async_pt_3.b === 505)
+ Remote SQL: SELECT a, b, c FROM public.base_tbl3 ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+(14 rows)
+
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+ a | b | c
+------+-----+------
+ 1505 | 505 | 0505
+ 2505 | 505 | 0505
+ 3505 | 505 | 0505
+(3 rows)
+
+-- Test async Merge Append rescan
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT
+ ARRAY(SELECT f.i FROM (SELECT b + g.i FROM async_pt WHERE a > g.i ORDER BY b) f(i) ORDER BY f.i LIMIT 10)
+FROM generate_series(1, 3) g(i);
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------
+ Function Scan on pg_catalog.generate_series g
+ Output: ARRAY(SubPlan array_1)
+ Function Call: generate_series(1, 3)
+ SubPlan array_1
+ -> Limit
+ Output: f.i
+ -> Sort
+ Output: f.i
+ Sort Key: f.i
+ -> Subquery Scan on f
+ Output: f.i
+ -> Merge Append
+ Sort Key: async_pt.b
+ -> Async Foreign Scan on public.async_p1 async_pt_1
+ Output: (async_pt_1.b + g.i), async_pt_1.b
+ Remote SQL: SELECT b FROM public.base_tbl1 WHERE ((a > $1::integer)) ORDER BY b ASC NULLS LAST
+ -> Async Foreign Scan on public.async_p2 async_pt_2
+ Output: (async_pt_2.b + g.i), async_pt_2.b
+ Remote SQL: SELECT b FROM public.base_tbl2 WHERE ((a > $1::integer)) ORDER BY b ASC NULLS LAST
+ -> Async Foreign Scan on public.async_p3 async_pt_3
+ Output: (async_pt_3.b + g.i), async_pt_3.b
+ Remote SQL: SELECT b FROM public.base_tbl3 WHERE ((a > $1::integer)) ORDER BY b ASC NULLS LAST
+(22 rows)
+
+SELECT
+ ARRAY(SELECT f.i FROM (SELECT b + g.i FROM async_pt WHERE a > g.i ORDER BY b) f(i) ORDER BY f.i LIMIT 10)
+FROM generate_series(1, 3) g(i);
+ array
+---------------------------
+ {1,1,1,6,6,6,11,11,11,16}
+ {2,2,2,7,7,7,12,12,12,17}
+ {3,3,3,8,8,8,13,13,13,18}
+(3 rows)
+
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
-- Check case where the partitioned table has local/remote partitions
@@ -11639,6 +11749,37 @@ SELECT * FROM result_tbl ORDER BY a;
(3 rows)
DELETE FROM result_tbl;
+-- Test Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------
+ Merge Append
+ Sort Key: async_pt.b, async_pt.a
+ -> Async Foreign Scan on public.async_p1 async_pt_1
+ Output: async_pt_1.a, async_pt_1.b, async_pt_1.c
+ Filter: (async_pt_1.b === 505)
+ Remote SQL: SELECT a, b, c FROM public.base_tbl1 ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+ -> Async Foreign Scan on public.async_p2 async_pt_2
+ Output: async_pt_2.a, async_pt_2.b, async_pt_2.c
+ Filter: (async_pt_2.b === 505)
+ Remote SQL: SELECT a, b, c FROM public.base_tbl2 ORDER BY b ASC NULLS LAST, a ASC NULLS LAST
+ -> Sort
+ Output: async_pt_3.a, async_pt_3.b, async_pt_3.c
+ Sort Key: async_pt_3.b, async_pt_3.a
+ -> Seq Scan on public.async_p3 async_pt_3
+ Output: async_pt_3.a, async_pt_3.b, async_pt_3.c
+ Filter: (async_pt_3.b === 505)
+(16 rows)
+
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+ a | b | c
+------+-----+------
+ 1505 | 505 | 0505
+ 2505 | 505 | 0505
+ 3505 | 505 | 0505
+(3 rows)
+
-- partitionwise joins
SET enable_partitionwise_join TO true;
CREATE TABLE join_tbl (a1 int, b1 int, c1 text, a2 int, b2 int, c2 text);
@@ -12421,6 +12562,153 @@ SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM f
DROP FOREIGN TABLE foreign_tbl CASCADE;
NOTICE: drop cascades to foreign table foreign_tbl2
DROP TABLE base_tbl;
+-- Test async Merge Append
+CREATE TABLE distr1 (i int, j int, k text) PARTITION BY HASH (i);
+CREATE TABLE base1 (i int, j int, k text);
+CREATE TABLE base2 (i int, j int, k text);
+CREATE FOREIGN TABLE distr1_p1 PARTITION OF distr1 FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+SERVER loopback OPTIONS (table_name 'base1');
+CREATE FOREIGN TABLE distr1_p2 PARTITION OF distr1 FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+SERVER loopback OPTIONS (table_name 'base2');
+CREATE TABLE distr2 (i int, j int, k text) PARTITION BY HASH (i);
+CREATE TABLE base3 (i int, j int, k text);
+CREATE TABLE base4 (i int, j int, k text);
+CREATE FOREIGN TABLE distr2_p1 PARTITION OF distr2 FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+SERVER loopback OPTIONS (table_name 'base3');
+CREATE FOREIGN TABLE distr2_p2 PARTITION OF distr2 FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+SERVER loopback OPTIONS (table_name 'base4');
+INSERT INTO distr1
+SELECT i, i*10, 'data_' || i FROM generate_series(1, 1000) i;
+INSERT INTO distr2
+SELECT i, i*10, 'data_' || i FROM generate_series(1, 100) i;
+ANALYZE distr1_p1;
+ANALYZE distr1_p2;
+ANALYZE distr2_p1;
+ANALYZE distr2_p2;
+SET enable_partitionwise_join TO ON;
+-- Test joins with async Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM distr1, distr2 WHERE distr1.i=distr2.i AND distr2.j > 90 and distr2.k like 'data%'
+ORDER BY distr2.i LIMIT 10;
+ QUERY PLAN
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+ Output: distr1.i, distr1.j, distr1.k, distr2.i, distr2.j, distr2.k
+ -> Merge Append
+ Sort Key: distr1.i
+ -> Async Foreign Scan
+ Output: distr1_1.i, distr1_1.j, distr1_1.k, distr2_1.i, distr2_1.j, distr2_1.k
+ Relations: (public.distr1_p1 distr1_1) INNER JOIN (public.distr2_p1 distr2_1)
+ Remote SQL: SELECT r3.i, r3.j, r3.k, r5.i, r5.j, r5.k FROM (public.base1 r3 INNER JOIN public.base3 r5 ON (((r3.i = r5.i)) AND ((r5.j > 90)) AND ((r5.k ~~ 'data%')))) ORDER BY r3.i ASC NULLS LAST
+ -> Async Foreign Scan
+ Output: distr1_2.i, distr1_2.j, distr1_2.k, distr2_2.i, distr2_2.j, distr2_2.k
+ Relations: (public.distr1_p2 distr1_2) INNER JOIN (public.distr2_p2 distr2_2)
+ Remote SQL: SELECT r4.i, r4.j, r4.k, r6.i, r6.j, r6.k FROM (public.base2 r4 INNER JOIN public.base4 r6 ON (((r4.i = r6.i)) AND ((r6.j > 90)) AND ((r6.k ~~ 'data%')))) ORDER BY r4.i ASC NULLS LAST
+(12 rows)
+
+SELECT * FROM distr1, distr2 WHERE distr1.i=distr2.i AND distr2.j > 90 and distr2.k like 'data%'
+ORDER BY distr2.i LIMIT 10;
+ i | j | k | i | j | k
+----+-----+---------+----+-----+---------
+ 10 | 100 | data_10 | 10 | 100 | data_10
+ 11 | 110 | data_11 | 11 | 110 | data_11
+ 12 | 120 | data_12 | 12 | 120 | data_12
+ 13 | 130 | data_13 | 13 | 130 | data_13
+ 14 | 140 | data_14 | 14 | 140 | data_14
+ 15 | 150 | data_15 | 15 | 150 | data_15
+ 16 | 160 | data_16 | 16 | 160 | data_16
+ 17 | 170 | data_17 | 17 | 170 | data_17
+ 18 | 180 | data_18 | 18 | 180 | data_18
+ 19 | 190 | data_19 | 19 | 190 | data_19
+(10 rows)
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM distr1 LEFT JOIN distr2 ON distr1.i=distr2.i AND distr2.k like 'data%' WHERE distr1.i > 90
+ORDER BY distr1.i LIMIT 20;
+ QUERY PLAN
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+ Output: distr1.i, distr1.j, distr1.k, distr2.i, distr2.j, distr2.k
+ -> Merge Append
+ Sort Key: distr1.i
+ -> Async Foreign Scan
+ Output: distr1_1.i, distr1_1.j, distr1_1.k, distr2_1.i, distr2_1.j, distr2_1.k
+ Relations: (public.distr1_p1 distr1_1) LEFT JOIN (public.distr2_p1 distr2_1)
+ Remote SQL: SELECT r4.i, r4.j, r4.k, r6.i, r6.j, r6.k FROM (public.base1 r4 LEFT JOIN public.base3 r6 ON (((r4.i = r6.i)) AND ((r6.k ~~ 'data%')))) WHERE ((r4.i > 90)) ORDER BY r4.i ASC NULLS LAST
+ -> Async Foreign Scan
+ Output: distr1_2.i, distr1_2.j, distr1_2.k, distr2_2.i, distr2_2.j, distr2_2.k
+ Relations: (public.distr1_p2 distr1_2) LEFT JOIN (public.distr2_p2 distr2_2)
+ Remote SQL: SELECT r5.i, r5.j, r5.k, r7.i, r7.j, r7.k FROM (public.base2 r5 LEFT JOIN public.base4 r7 ON (((r5.i = r7.i)) AND ((r7.k ~~ 'data%')))) WHERE ((r5.i > 90)) ORDER BY r5.i ASC NULLS LAST
+(12 rows)
+
+SELECT * FROM distr1 LEFT JOIN distr2 ON distr1.i=distr2.i AND distr2.k like 'data%' WHERE distr1.i > 90
+ORDER BY distr1.i LIMIT 20;
+ i | j | k | i | j | k
+-----+------+----------+-----+------+----------
+ 91 | 910 | data_91 | 91 | 910 | data_91
+ 92 | 920 | data_92 | 92 | 920 | data_92
+ 93 | 930 | data_93 | 93 | 930 | data_93
+ 94 | 940 | data_94 | 94 | 940 | data_94
+ 95 | 950 | data_95 | 95 | 950 | data_95
+ 96 | 960 | data_96 | 96 | 960 | data_96
+ 97 | 970 | data_97 | 97 | 970 | data_97
+ 98 | 980 | data_98 | 98 | 980 | data_98
+ 99 | 990 | data_99 | 99 | 990 | data_99
+ 100 | 1000 | data_100 | 100 | 1000 | data_100
+ 101 | 1010 | data_101 | | |
+ 102 | 1020 | data_102 | | |
+ 103 | 1030 | data_103 | | |
+ 104 | 1040 | data_104 | | |
+ 105 | 1050 | data_105 | | |
+ 106 | 1060 | data_106 | | |
+ 107 | 1070 | data_107 | | |
+ 108 | 1080 | data_108 | | |
+ 109 | 1090 | data_109 | | |
+ 110 | 1100 | data_110 | | |
+(20 rows)
+
+-- Test pruning with async Merge Append
+DELETE FROM distr2;
+INSERT INTO distr2
+SELECT i%10, i*10, 'data_' || i FROM generate_series(1, 1000) i;
+DEALLOCATE ALL;
+SET plan_cache_mode TO force_generic_plan;
+PREPARE async_pt_query (int, int) AS
+ SELECT * FROM distr2 WHERE i = ANY(ARRAY[$1, $2])
+ ORDER BY i,j
+ LIMIT 10;
+EXPLAIN (VERBOSE, COSTS OFF)
+ EXECUTE async_pt_query(1, 1);
+ QUERY PLAN
+------------------------------------------------------------------------------------------------------------------------------------------------------------
+ Limit
+ Output: distr2.i, distr2.j, distr2.k
+ -> Merge Append
+ Sort Key: distr2.i, distr2.j
+ Subplans Removed: 1
+ -> Async Foreign Scan on public.distr2_p1 distr2_1
+ Output: distr2_1.i, distr2_1.j, distr2_1.k
+ Remote SQL: SELECT i, j, k FROM public.base3 WHERE ((i = ANY (ARRAY[$1::integer, $2::integer]))) ORDER BY i ASC NULLS LAST, j ASC NULLS LAST
+(8 rows)
+
+EXECUTE async_pt_query(1, 1);
+ i | j | k
+---+-----+---------
+ 1 | 10 | data_1
+ 1 | 110 | data_11
+ 1 | 210 | data_21
+ 1 | 310 | data_31
+ 1 | 410 | data_41
+ 1 | 510 | data_51
+ 1 | 610 | data_61
+ 1 | 710 | data_71
+ 1 | 810 | data_81
+ 1 | 910 | data_91
+(10 rows)
+
+RESET plan_cache_mode;
+RESET enable_partitionwise_join;
+DROP TABLE distr1, distr2, base1, base2, base3, base4;
ALTER SERVER loopback OPTIONS (DROP async_capable);
ALTER SERVER loopback2 OPTIONS (DROP async_capable);
-- ===================================================================
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 5e178c21b39..bd551a1db72 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -7213,12 +7213,16 @@ postgresForeignAsyncConfigureWait(AsyncRequest *areq)
ForeignScanState *node = (ForeignScanState *) areq->requestee;
PgFdwScanState *fsstate = (PgFdwScanState *) node->fdw_state;
AsyncRequest *pendingAreq = fsstate->conn_state->pendingAreq;
- AppendState *requestor = (AppendState *) areq->requestor;
- WaitEventSet *set = requestor->as_eventset;
+ PlanState *requestor = areq->requestor;
+ WaitEventSet *set;
+ Bitmapset *needrequest;
/* This should not be called unless callback_pending */
Assert(areq->callback_pending);
+ set = GetAppendEventSet(requestor);
+ needrequest = GetNeedRequest(requestor);
+
/*
* If process_pending_request() has been invoked on the given request
* before we get here, we might have some tuples already; in which case
@@ -7256,7 +7260,7 @@ postgresForeignAsyncConfigureWait(AsyncRequest *areq)
* below, because we might otherwise end up with no configured events
* other than the postmaster death event.
*/
- if (!bms_is_empty(requestor->as_needrequest))
+ if (!bms_is_empty(needrequest))
return;
if (GetNumRegisteredWaitEvents(set) > 1)
return;
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 4f7ab2ed0ac..89188dff783 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3929,6 +3929,11 @@ INSERT INTO result_tbl SELECT a, b, 'AAA' || c FROM async_pt WHERE b === 505;
SELECT * FROM result_tbl ORDER BY a;
DELETE FROM result_tbl;
+-- Test Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
+SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a;
+
-- Test error handling, if accessing one of the foreign partitions errors out
CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (10000) TO (10001)
SERVER loopback OPTIONS (table_name 'non_existent_table');
@@ -3952,6 +3957,20 @@ DELETE FROM result_tbl;
-- Test COPY TO when foreign table is partition
COPY async_pt TO stdout; --error
+-- Test Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+
+-- Test async Merge Append rescan
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT
+ ARRAY(SELECT f.i FROM (SELECT b + g.i FROM async_pt WHERE a > g.i ORDER BY b) f(i) ORDER BY f.i LIMIT 10)
+FROM generate_series(1, 3) g(i);
+SELECT
+ ARRAY(SELECT f.i FROM (SELECT b + g.i FROM async_pt WHERE a > g.i ORDER BY b) f(i) ORDER BY f.i LIMIT 10)
+FROM generate_series(1, 3) g(i);
+
DROP FOREIGN TABLE async_p3;
DROP TABLE base_tbl3;
@@ -3967,6 +3986,11 @@ INSERT INTO result_tbl SELECT * FROM async_pt WHERE b === 505;
SELECT * FROM result_tbl ORDER BY a;
DELETE FROM result_tbl;
+-- Test Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+SELECT * FROM async_pt WHERE b === 505 ORDER BY b, a;
+
-- partitionwise joins
SET enable_partitionwise_join TO true;
@@ -4205,6 +4229,69 @@ SELECT a FROM base_tbl WHERE (a, random() > 0) IN (SELECT a, random() > 0 FROM f
DROP FOREIGN TABLE foreign_tbl CASCADE;
DROP TABLE base_tbl;
+-- Test async Merge Append
+CREATE TABLE distr1 (i int, j int, k text) PARTITION BY HASH (i);
+CREATE TABLE base1 (i int, j int, k text);
+CREATE TABLE base2 (i int, j int, k text);
+CREATE FOREIGN TABLE distr1_p1 PARTITION OF distr1 FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+SERVER loopback OPTIONS (table_name 'base1');
+CREATE FOREIGN TABLE distr1_p2 PARTITION OF distr1 FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+SERVER loopback OPTIONS (table_name 'base2');
+
+CREATE TABLE distr2 (i int, j int, k text) PARTITION BY HASH (i);
+CREATE TABLE base3 (i int, j int, k text);
+CREATE TABLE base4 (i int, j int, k text);
+CREATE FOREIGN TABLE distr2_p1 PARTITION OF distr2 FOR VALUES WITH (MODULUS 2, REMAINDER 0)
+SERVER loopback OPTIONS (table_name 'base3');
+CREATE FOREIGN TABLE distr2_p2 PARTITION OF distr2 FOR VALUES WITH (MODULUS 2, REMAINDER 1)
+SERVER loopback OPTIONS (table_name 'base4');
+
+INSERT INTO distr1
+SELECT i, i*10, 'data_' || i FROM generate_series(1, 1000) i;
+
+INSERT INTO distr2
+SELECT i, i*10, 'data_' || i FROM generate_series(1, 100) i;
+
+ANALYZE distr1_p1;
+ANALYZE distr1_p2;
+ANALYZE distr2_p1;
+ANALYZE distr2_p2;
+
+SET enable_partitionwise_join TO ON;
+
+-- Test joins with async Merge Append
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM distr1, distr2 WHERE distr1.i=distr2.i AND distr2.j > 90 and distr2.k like 'data%'
+ORDER BY distr2.i LIMIT 10;
+SELECT * FROM distr1, distr2 WHERE distr1.i=distr2.i AND distr2.j > 90 and distr2.k like 'data%'
+ORDER BY distr2.i LIMIT 10;
+
+EXPLAIN (VERBOSE, COSTS OFF)
+SELECT * FROM distr1 LEFT JOIN distr2 ON distr1.i=distr2.i AND distr2.k like 'data%' WHERE distr1.i > 90
+ORDER BY distr1.i LIMIT 20;
+SELECT * FROM distr1 LEFT JOIN distr2 ON distr1.i=distr2.i AND distr2.k like 'data%' WHERE distr1.i > 90
+ORDER BY distr1.i LIMIT 20;
+
+-- Test pruning with async Merge Append
+DELETE FROM distr2;
+INSERT INTO distr2
+SELECT i%10, i*10, 'data_' || i FROM generate_series(1, 1000) i;
+
+DEALLOCATE ALL;
+SET plan_cache_mode TO force_generic_plan;
+PREPARE async_pt_query (int, int) AS
+ SELECT * FROM distr2 WHERE i = ANY(ARRAY[$1, $2])
+ ORDER BY i,j
+ LIMIT 10;
+EXPLAIN (VERBOSE, COSTS OFF)
+ EXECUTE async_pt_query(1, 1);
+EXECUTE async_pt_query(1, 1);
+RESET plan_cache_mode;
+
+RESET enable_partitionwise_join;
+
+DROP TABLE distr1, distr2, base1, base2, base3, base4;
+
ALTER SERVER loopback OPTIONS (DROP async_capable);
ALTER SERVER loopback2 OPTIONS (DROP async_capable);
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index cdfe8e376f0..f7f7e33cee6 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5473,6 +5473,20 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
</listitem>
</varlistentry>
+ <varlistentry id="guc-enable-async-merge-append" xreflabel="enable_async_merge_append">
+ <term><varname>enable_async_merge_append</varname> (<type>boolean</type>)
+ <indexterm>
+ <primary><varname>enable_async_merge_append</varname> configuration parameter</primary>
+ </indexterm>
+ </term>
+ <listitem>
+ <para>
+ Enables or disables the query planner's use of async-aware
+ merge append plan types. The default is <literal>on</literal>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry id="guc-enable-bitmapscan" xreflabel="enable_bitmapscan">
<term><varname>enable_bitmapscan</varname> (<type>boolean</type>)
<indexterm>
diff --git a/src/backend/executor/execAsync.c b/src/backend/executor/execAsync.c
index 5d3cabe73e3..6dc19ebc374 100644
--- a/src/backend/executor/execAsync.c
+++ b/src/backend/executor/execAsync.c
@@ -17,6 +17,7 @@
#include "executor/execAsync.h"
#include "executor/executor.h"
#include "executor/nodeAppend.h"
+#include "executor/nodeMergeAppend.h"
#include "executor/nodeForeignscan.h"
/*
@@ -121,6 +122,9 @@ ExecAsyncResponse(AsyncRequest *areq)
case T_AppendState:
ExecAsyncAppendResponse(areq);
break;
+ case T_MergeAppendState:
+ ExecAsyncMergeAppendResponse(areq);
+ break;
default:
/* If the node doesn't support async, caller messed up. */
elog(ERROR, "unrecognized node type: %d",
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 77c4dd9e4b1..dfbc7b510c4 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -1187,10 +1187,7 @@ ExecAsyncAppendResponse(AsyncRequest *areq)
static void
classify_matching_subplans(AppendState *node)
{
- Bitmapset *valid_asyncplans;
-
Assert(node->as_valid_subplans_identified);
- Assert(node->as_valid_asyncplans == NULL);
/* Nothing to do if there are no valid subplans. */
if (bms_is_empty(node->as_valid_subplans))
@@ -1200,21 +1197,10 @@ classify_matching_subplans(AppendState *node)
return;
}
- /* Nothing to do if there are no valid async subplans. */
- if (!bms_overlap(node->as_valid_subplans, node->as_asyncplans))
- {
+ /* No valid async subplans identified. */
+ if (!classify_matching_subplans_common(
+ &node->as_valid_subplans,
+ node->as_asyncplans,
+ &node->as_valid_asyncplans))
node->as_nasyncremain = 0;
- return;
- }
-
- /* Get valid async subplans. */
- valid_asyncplans = bms_intersect(node->as_asyncplans,
- node->as_valid_subplans);
-
- /* Adjust the valid subplans to contain sync subplans only. */
- node->as_valid_subplans = bms_del_members(node->as_valid_subplans,
- valid_asyncplans);
-
- /* Save valid async subplans. */
- node->as_valid_asyncplans = valid_asyncplans;
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 300bcd5cf33..f1c267eb9eb 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -39,10 +39,15 @@
#include "postgres.h"
#include "executor/executor.h"
+#include "executor/execAsync.h"
#include "executor/execPartition.h"
#include "executor/nodeMergeAppend.h"
#include "lib/binaryheap.h"
#include "miscadmin.h"
+#include "storage/latch.h"
+#include "utils/wait_event.h"
+
+#define EVENT_BUFFER_SIZE 16
/*
* We have one slot for each item in the heap array. We use SlotNumber
@@ -54,6 +59,12 @@ typedef int32 SlotNumber;
static TupleTableSlot *ExecMergeAppend(PlanState *pstate);
static int heap_compare_slots(Datum a, Datum b, void *arg);
+static void classify_matching_subplans(MergeAppendState *node);
+static void ExecMergeAppendAsyncBegin(MergeAppendState *node);
+static void ExecMergeAppendAsyncGetNext(MergeAppendState *node, int mplan);
+static bool ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan);
+static void ExecMergeAppendAsyncEventWait(MergeAppendState *node);
+
/* ----------------------------------------------------------------
* ExecInitMergeAppend
@@ -71,6 +82,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
int nplans;
int i,
j;
+ Bitmapset *asyncplans;
+ int nasyncplans;
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -106,7 +119,10 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* later calls to ExecFindMatchingSubPlans.
*/
if (!prunestate->do_exec_prune && nplans > 0)
+ {
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
+ mergestate->ms_valid_subplans_identified = true;
+ }
}
else
{
@@ -119,6 +135,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
Assert(nplans > 0);
mergestate->ms_valid_subplans = validsubplans =
bms_add_range(NULL, 0, nplans - 1);
+ mergestate->ms_valid_subplans_identified = true;
mergestate->ms_prune_state = NULL;
}
@@ -135,11 +152,25 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* the results into the mergeplanstates array.
*/
j = 0;
+ asyncplans = NULL;
+ nasyncplans = 0;
+
i = -1;
while ((i = bms_next_member(validsubplans, i)) >= 0)
{
Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
+ /*
+ * Record async subplans. When executing EvalPlanQual, we treat them
+ * as sync ones; don't do this when initializing an EvalPlanQual plan
+ * tree.
+ */
+ if (initNode->async_capable && estate->es_epq_active == NULL)
+ {
+ asyncplans = bms_add_member(asyncplans, j);
+ nasyncplans++;
+ }
+
mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
}
@@ -170,6 +201,45 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
mergestate->ps.ps_ProjInfo = NULL;
+ /* Initialize async state */
+ mergestate->ms_asyncplans = asyncplans;
+ mergestate->ms_nasyncplans = nasyncplans;
+ mergestate->ms_asyncrequests = NULL;
+ mergestate->ms_asyncresults = NULL;
+ mergestate->ms_has_asyncresults = NULL;
+ mergestate->ms_asyncremain = NULL;
+ mergestate->ms_needrequest = NULL;
+ mergestate->ms_eventset = NULL;
+ mergestate->ms_valid_asyncplans = NULL;
+
+ if (nasyncplans > 0)
+ {
+ mergestate->ms_asyncrequests = (AsyncRequest **)
+ palloc0(nplans * sizeof(AsyncRequest *));
+
+ i = -1;
+ while ((i = bms_next_member(asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq;
+
+ areq = palloc(sizeof(AsyncRequest));
+ areq->requestor = (PlanState *) mergestate;
+ areq->requestee = mergeplanstates[i];
+ areq->request_index = i;
+ areq->callback_pending = false;
+ areq->request_complete = false;
+ areq->result = NULL;
+
+ mergestate->ms_asyncrequests[i] = areq;
+ }
+
+ mergestate->ms_asyncresults = (TupleTableSlot **)
+ palloc0(nplans * sizeof(TupleTableSlot *));
+
+ if (mergestate->ms_valid_subplans_identified)
+ classify_matching_subplans(mergestate);
+ }
+
/*
* initialize sort-key information
*/
@@ -226,14 +296,18 @@ ExecMergeAppend(PlanState *pstate)
if (node->ms_nplans == 0)
return ExecClearTuple(node->ps.ps_ResultTupleSlot);
- /*
- * If we've yet to determine the valid subplans then do so now. If
- * run-time pruning is disabled then the valid subplans will always be
- * set to all subplans.
- */
- if (node->ms_valid_subplans == NULL)
+ /* If we've yet to determine the valid subplans then do so now. */
+ if (!node->ms_valid_subplans_identified)
+ {
node->ms_valid_subplans =
ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
+ node->ms_valid_subplans_identified = true;
+ classify_matching_subplans(node);
+ }
+
+ /* If there are any async subplans, begin executing them. */
+ if (node->ms_nasyncplans > 0)
+ ExecMergeAppendAsyncBegin(node);
/*
* First time through: pull the first tuple from each valid subplan,
@@ -246,6 +320,16 @@ ExecMergeAppend(PlanState *pstate)
if (!TupIsNull(node->ms_slots[i]))
binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
}
+
+ /* Look at valid async subplans */
+ i = -1;
+ while ((i = bms_next_member(node->ms_valid_asyncplans, i)) >= 0)
+ {
+ ExecMergeAppendAsyncGetNext(node, i);
+ if (!TupIsNull(node->ms_slots[i]))
+ binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
+ }
+
binaryheap_build(node->ms_heap);
node->ms_initialized = true;
}
@@ -260,7 +344,13 @@ ExecMergeAppend(PlanState *pstate)
* to not pull tuples until necessary.)
*/
i = DatumGetInt32(binaryheap_first(node->ms_heap));
- node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
+ if (bms_is_member(i, node->ms_asyncplans))
+ ExecMergeAppendAsyncGetNext(node, i);
+ else
+ {
+ Assert(bms_is_member(i, node->ms_valid_subplans));
+ node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
+ }
if (!TupIsNull(node->ms_slots[i]))
binaryheap_replace_first(node->ms_heap, Int32GetDatum(i));
else
@@ -276,6 +366,8 @@ ExecMergeAppend(PlanState *pstate)
{
i = DatumGetInt32(binaryheap_first(node->ms_heap));
result = node->ms_slots[i];
+ /* For async plan record that we can get the next tuple */
+ node->ms_has_asyncresults = bms_del_member(node->ms_has_asyncresults, i);
}
return result;
@@ -355,6 +447,7 @@ void
ExecReScanMergeAppend(MergeAppendState *node)
{
int i;
+ int nasyncplans = node->ms_nasyncplans;
/*
* If any PARAM_EXEC Params used in pruning expressions have changed, then
@@ -365,8 +458,11 @@ ExecReScanMergeAppend(MergeAppendState *node)
bms_overlap(node->ps.chgParam,
node->ms_prune_state->execparamids))
{
+ node->ms_valid_subplans_identified = false;
bms_free(node->ms_valid_subplans);
node->ms_valid_subplans = NULL;
+ bms_free(node->ms_valid_asyncplans);
+ node->ms_valid_asyncplans = NULL;
}
for (i = 0; i < node->ms_nplans; i++)
@@ -387,6 +483,367 @@ ExecReScanMergeAppend(MergeAppendState *node)
if (subnode->chgParam == NULL)
ExecReScan(subnode);
}
+
+ /* Reset async state */
+ if (nasyncplans > 0)
+ {
+ i = -1;
+ while ((i = bms_next_member(node->ms_asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq = node->ms_asyncrequests[i];
+
+ areq->callback_pending = false;
+ areq->request_complete = false;
+ areq->result = NULL;
+ }
+
+ bms_free(node->ms_asyncremain);
+ node->ms_asyncremain = NULL;
+ bms_free(node->ms_needrequest);
+ node->ms_needrequest = NULL;
+ bms_free(node->ms_has_asyncresults);
+ node->ms_has_asyncresults = NULL;
+ }
binaryheap_reset(node->ms_heap);
node->ms_initialized = false;
}
+
+/* ----------------------------------------------------------------
+ * classify_matching_subplans
+ *
+ * Classify the node's ms_valid_subplans into sync ones and
+ * async ones, adjust it to contain sync ones only, and save
+ * async ones in the node's ms_valid_asyncplans.
+ * ----------------------------------------------------------------
+ */
+static void
+classify_matching_subplans(MergeAppendState *node)
+{
+ Assert(node->ms_valid_subplans_identified);
+
+ /* Nothing to do if there are no valid subplans. */
+ if (bms_is_empty(node->ms_valid_subplans))
+ {
+ node->ms_asyncremain = NULL;
+ return;
+ }
+
+ /* No valid async subplans identified. */
+ if (!classify_matching_subplans_common(
+ &node->ms_valid_subplans,
+ node->ms_asyncplans,
+ &node->ms_valid_asyncplans))
+ node->ms_asyncremain = NULL;
+}
+
+/* ----------------------------------------------------------------
+ * ExecMergeAppendAsyncBegin
+ *
+ * Begin executing designed async-capable subplans.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecMergeAppendAsyncBegin(MergeAppendState *node)
+{
+ int i;
+
+ /* Backward scan is not supported by async-aware MergeAppends. */
+ Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+ /* We should never be called when there are no subplans */
+ Assert(node->ms_nplans > 0);
+
+ /* We should never be called when there are no async subplans. */
+ Assert(node->ms_nasyncplans > 0);
+
+ /* ExecMergeAppend() identifies valid subplans */
+ Assert(node->ms_valid_subplans_identified);
+
+ /* Initialize state variables. */
+ node->ms_asyncremain = bms_copy(node->ms_valid_asyncplans);
+
+ /* Nothing to do if there are no valid async subplans. */
+ if (bms_is_empty(node->ms_asyncremain))
+ return;
+
+ /* Make a request for each of the valid async subplans. */
+ i = -1;
+ while ((i = bms_next_member(node->ms_valid_asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq = node->ms_asyncrequests[i];
+
+ Assert(areq->request_index == i);
+ Assert(!areq->callback_pending);
+
+ /* Do the actual work. */
+ ExecAsyncRequest(areq);
+ }
+}
+
+/* ----------------------------------------------------------------
+ * ExecMergeAppendAsyncGetNext
+ *
+ * Get the next tuple from specified asynchronous subplan.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecMergeAppendAsyncGetNext(MergeAppendState *node, int mplan)
+{
+ node->ms_slots[mplan] = NULL;
+
+ /* Request a tuple asynchronously. */
+ if (ExecMergeAppendAsyncRequest(node, mplan))
+ return;
+
+ /*
+ * node->ms_asyncremain can be NULL if we have fetched tuples, but haven't
+ * returned them yet. In this case ExecMergeAppendAsyncRequest() above
+ * just returns tuples without performing a request.
+ */
+ while (bms_is_member(mplan, node->ms_asyncremain))
+ {
+ CHECK_FOR_INTERRUPTS();
+
+ /* Wait or poll for async events. */
+ ExecMergeAppendAsyncEventWait(node);
+
+ /* Request a tuple asynchronously. */
+ if (ExecMergeAppendAsyncRequest(node, mplan))
+ return;
+
+ /*
+ * Waiting until there's no async requests pending or we got some
+ * tuples from our request
+ */
+ }
+
+ /* No tuples */
+ return;
+}
+
+/* ----------------------------------------------------------------
+ * ExecMergeAppendAsyncRequest
+ *
+ * Request a tuple asynchronously.
+ * ----------------------------------------------------------------
+ */
+static bool
+ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan)
+{
+ Bitmapset *needrequest;
+ int i;
+
+ /*
+ * If we've already fetched necessary data, just return it
+ */
+ if (bms_is_member(mplan, node->ms_has_asyncresults))
+ {
+ node->ms_slots[mplan] = node->ms_asyncresults[mplan];
+ return true;
+ }
+
+ /*
+ * Get a list of members which can process request and don't have data
+ * ready.
+ */
+ needrequest = NULL;
+ i = -1;
+ while ((i = bms_next_member(node->ms_needrequest, i)) >= 0)
+ {
+ if (!bms_is_member(i, node->ms_has_asyncresults))
+ needrequest = bms_add_member(needrequest, i);
+ }
+
+ /*
+ * If there's no members, which still need request, no need to send it.
+ */
+ if (bms_is_empty(needrequest))
+ return false;
+
+ /* Clear ms_needrequest flag, as we are going to send requests now */
+ node->ms_needrequest = bms_del_members(node->ms_needrequest, needrequest);
+
+ /* Make a new request for each of the async subplans that need it. */
+ i = -1;
+ while ((i = bms_next_member(needrequest, i)) >= 0)
+ {
+ AsyncRequest *areq = node->ms_asyncrequests[i];
+
+ /*
+ * We've just checked that subplan doesn't already have some fetched
+ * data
+ */
+ Assert(!bms_is_member(i, node->ms_has_asyncresults));
+
+ /* Do the actual work. */
+ ExecAsyncRequest(areq);
+ }
+ bms_free(needrequest);
+
+ /* Return needed asynchronously-generated results if any. */
+ if (bms_is_member(mplan, node->ms_has_asyncresults))
+ {
+ node->ms_slots[mplan] = node->ms_asyncresults[mplan];
+ return true;
+ }
+
+ return false;
+}
+
+/* ----------------------------------------------------------------
+ * ExecAsyncMergeAppendResponse
+ *
+ * Receive a response from an asynchronous request we made.
+ * ----------------------------------------------------------------
+ */
+void
+ExecAsyncMergeAppendResponse(AsyncRequest *areq)
+{
+ MergeAppendState *node = (MergeAppendState *) areq->requestor;
+ TupleTableSlot *slot = areq->result;
+
+ /* The result should be a TupleTableSlot or NULL. */
+ Assert(slot == NULL || IsA(slot, TupleTableSlot));
+ /* We should handle previous async result prior to getting new one */
+ Assert(!bms_is_member(areq->request_index, node->ms_has_asyncresults));
+
+ node->ms_asyncresults[areq->request_index] = NULL;
+ /* Nothing to do if the request is pending. */
+ if (!areq->request_complete)
+ {
+ /* The request would have been pending for a callback. */
+ Assert(areq->callback_pending);
+ return;
+ }
+
+ /* If the result is NULL or an empty slot, there's nothing more to do. */
+ if (TupIsNull(slot))
+ {
+ /* The ending subplan wouldn't have been pending for a callback. */
+ Assert(!areq->callback_pending);
+ node->ms_asyncremain = bms_del_member(node->ms_asyncremain,
+ areq->request_index);
+ return;
+ }
+
+ /* Mark that the async request has a result */
+ node->ms_has_asyncresults = bms_add_member(node->ms_has_asyncresults,
+ areq->request_index);
+ /* Save result so we can return it. */
+ node->ms_asyncresults[areq->request_index] = slot;
+
+ /*
+ * Mark the subplan that returned a result as ready for a new request. We
+ * don't launch another one here immediately because it might complete.
+ */
+ node->ms_needrequest = bms_add_member(node->ms_needrequest,
+ areq->request_index);
+}
+
+/* ----------------------------------------------------------------
+ * ExecMergeAppendAsyncEventWait
+ *
+ * Wait or poll for file descriptor events and fire callbacks.
+ * ----------------------------------------------------------------
+ */
+static void
+ExecMergeAppendAsyncEventWait(MergeAppendState *node)
+{
+ int nevents = node->ms_nasyncplans + 2; /* one for PM death and
+ * one for latch */
+ WaitEvent occurred_event[EVENT_BUFFER_SIZE];
+ int noccurred;
+ int i;
+
+ /* We should never be called when there are no valid async subplans. */
+ Assert(bms_num_members(node->ms_asyncremain) > 0);
+
+ node->ms_eventset = CreateWaitEventSet(CurrentResourceOwner, nevents);
+ AddWaitEventToSet(node->ms_eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
+ NULL, NULL);
+
+ /* Give each waiting subplan a chance to add an event. */
+ i = -1;
+ while ((i = bms_next_member(node->ms_asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq = node->ms_asyncrequests[i];
+
+ if (areq->callback_pending)
+ ExecAsyncConfigureWait(areq);
+ }
+
+ /*
+ * No need for further processing if none of the subplans configured any
+ * events.
+ */
+ if (GetNumRegisteredWaitEvents(node->ms_eventset) == 1)
+ {
+ FreeWaitEventSet(node->ms_eventset);
+ node->ms_eventset = NULL;
+ return;
+ }
+
+ /*
+ * Add the process latch to the set, so that we wake up to process the
+ * standard interrupts with CHECK_FOR_INTERRUPTS().
+ *
+ * NOTE: For historical reasons, it's important that this is added to the
+ * WaitEventSet after the ExecAsyncConfigureWait() calls. Namely,
+ * postgres_fdw calls "GetNumRegisteredWaitEvents(set) == 1" to check if
+ * any other events are in the set. That's a poor design, it's
+ * questionable for postgres_fdw to be doing that in the first place, but
+ * we cannot change it now. The pattern has possibly been copied to other
+ * extensions too.
+ */
+ AddWaitEventToSet(node->ms_eventset, WL_LATCH_SET, PGINVALID_SOCKET,
+ MyLatch, NULL);
+
+ /* Return at most EVENT_BUFFER_SIZE events in one call. */
+ if (nevents > EVENT_BUFFER_SIZE)
+ nevents = EVENT_BUFFER_SIZE;
+
+ /*
+ * Wait until at least one event occurs.
+ */
+ noccurred = WaitEventSetWait(node->ms_eventset, -1 /* no timeout */ , occurred_event,
+ nevents, WAIT_EVENT_APPEND_READY);
+ FreeWaitEventSet(node->ms_eventset);
+ node->ms_eventset = NULL;
+ if (noccurred == 0)
+ return;
+
+ /* Deliver notifications. */
+ for (i = 0; i < noccurred; i++)
+ {
+ WaitEvent *w = &occurred_event[i];
+
+ /*
+ * Each waiting subplan should have registered its wait event with
+ * user_data pointing back to its AsyncRequest.
+ */
+ if ((w->events & WL_SOCKET_READABLE) != 0)
+ {
+ AsyncRequest *areq = (AsyncRequest *) w->user_data;
+
+ if (areq->callback_pending)
+ {
+ /*
+ * Mark it as no longer needing a callback. We must do this
+ * before dispatching the callback in case the callback resets
+ * the flag.
+ */
+ areq->callback_pending = false;
+
+ /* Do the actual work. */
+ ExecAsyncNotify(areq);
+ }
+ }
+
+ /* Handle standard interrupts */
+ if ((w->events & WL_LATCH_SET) != 0)
+ {
+ ResetLatch(MyLatch);
+ CHECK_FOR_INTERRUPTS();
+ }
+ }
+}
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 54931cd6e2a..e7d741a7699 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -163,6 +163,7 @@ bool enable_parallel_hash = true;
bool enable_partition_pruning = true;
bool enable_presorted_aggregate = true;
bool enable_async_append = true;
+bool enable_async_merge_append = true;
typedef struct
{
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 05fac693989..fc0e5c62ea9 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1466,6 +1466,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
+ bool consider_async = false;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1480,6 +1481,10 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
plan->righttree = NULL;
node->apprelids = rel->relids;
+ consider_async = (enable_async_merge_append &&
+ !best_path->path.parallel_safe &&
+ list_length(best_path->subpaths) > 1);
+
/*
* Compute sort column info, and adjust MergeAppend's tlist as needed.
* Because we pass adjust_tlist_in_place = true, we may ignore the
@@ -1580,6 +1585,10 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplan = sort_plan;
}
+ /* If needed, check to see if subplan can be executed asynchronously */
+ if (consider_async)
+ mark_async_capable_plan(subplan, subpath);
+
subplans = lappend(subplans, subplan);
}
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index ac0c7c36c56..b60b23fa4b9 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -821,6 +821,14 @@
boot_val => 'true',
},
+{ name => 'enable_async_merge_append', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
+ short_desc => 'Enables the planner\'s use of async merge append plans.',
+ flags => 'GUC_EXPLAIN',
+ variable => 'enable_async_merge_append',
+ boot_val => 'true',
+},
+
+
{ name => 'enable_bitmapscan', type => 'bool', context => 'PGC_USERSET', group => 'QUERY_TUNING_METHOD',
short_desc => 'Enables the planner\'s use of bitmap-scan plans.',
flags => 'GUC_EXPLAIN',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index dc9e2255f8a..d949d2aad04 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -405,6 +405,7 @@
# - Planner Method Configuration -
#enable_async_append = on
+#enable_async_merge_append = on
#enable_bitmapscan = on
#enable_gathermerge = on
#enable_hashagg = on
diff --git a/src/include/executor/nodeMergeAppend.h b/src/include/executor/nodeMergeAppend.h
index 4eb05dc30d6..e3fdb26ece6 100644
--- a/src/include/executor/nodeMergeAppend.h
+++ b/src/include/executor/nodeMergeAppend.h
@@ -19,5 +19,6 @@
extern MergeAppendState *ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags);
extern void ExecEndMergeAppend(MergeAppendState *node);
extern void ExecReScanMergeAppend(MergeAppendState *node);
+extern void ExecAsyncMergeAppendResponse(AsyncRequest *areq);
#endif /* NODEMERGEAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 3968429f991..5887cbf4f16 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1545,10 +1545,69 @@ typedef struct MergeAppendState
TupleTableSlot **ms_slots; /* array of length ms_nplans */
struct binaryheap *ms_heap; /* binary heap of slot indices */
bool ms_initialized; /* are subplans started? */
+ Bitmapset *ms_asyncplans; /* asynchronous plans indexes */
+ int ms_nasyncplans; /* # of asynchronous plans */
+ AsyncRequest **ms_asyncrequests; /* array of AsyncRequests */
+ TupleTableSlot **ms_asyncresults; /* unreturned results of async plans */
+ Bitmapset *ms_has_asyncresults; /* plans which have async results */
+ Bitmapset *ms_asyncremain; /* remaining asynchronous plans */
+ Bitmapset *ms_needrequest; /* asynchronous plans needing a new request */
+ struct WaitEventSet *ms_eventset; /* WaitEventSet used to configure file
+ * descriptor wait events */
struct PartitionPruneState *ms_prune_state;
+ bool ms_valid_subplans_identified; /* is ms_valid_subplans valid? */
Bitmapset *ms_valid_subplans;
+ Bitmapset *ms_valid_asyncplans; /* valid asynchronous plans indexes */
} MergeAppendState;
+/* Getters for AppendState and MergeAppendState */
+static inline struct WaitEventSet *
+GetAppendEventSet(PlanState *ps)
+{
+ Assert(IsA(ps, AppendState) || IsA(ps, MergeAppendState));
+
+ if (IsA(ps, AppendState))
+ return ((AppendState *) ps)->as_eventset;
+ else
+ return ((MergeAppendState *) ps)->ms_eventset;
+}
+
+static inline Bitmapset *
+GetNeedRequest(PlanState *ps)
+{
+ Assert(IsA(ps, AppendState) || IsA(ps, MergeAppendState));
+
+ if (IsA(ps, AppendState))
+ return ((AppendState *) ps)->as_needrequest;
+ else
+ return ((MergeAppendState *) ps)->ms_needrequest;
+}
+
+/* Common part of classify_matching_subplans() for Append and MergeAppend */
+static inline bool
+classify_matching_subplans_common(Bitmapset **valid_subplans,
+ Bitmapset *asyncplans,
+ Bitmapset **valid_asyncplans)
+{
+ Assert(*valid_asyncplans == NULL);
+
+ /* Checked by classify_matching_subplans() */
+ Assert(!bms_is_empty(*valid_subplans));
+
+ /* Nothing to do if there are no valid async subplans. */
+ if (!bms_overlap(*valid_subplans, asyncplans))
+ return false;
+
+ /* Get valid async subplans. */
+ *valid_asyncplans = bms_intersect(asyncplans,
+ *valid_subplans);
+
+ /* Adjust the valid subplans to contain sync subplans only. */
+ *valid_subplans = bms_del_members(*valid_subplans,
+ *valid_asyncplans);
+ return true;
+}
+
/* ----------------
* RecursiveUnionState information
*
diff --git a/src/include/optimizer/cost.h b/src/include/optimizer/cost.h
index b523bcda8f3..fee491b77ad 100644
--- a/src/include/optimizer/cost.h
+++ b/src/include/optimizer/cost.h
@@ -70,6 +70,7 @@ extern PGDLLIMPORT bool enable_parallel_hash;
extern PGDLLIMPORT bool enable_partition_pruning;
extern PGDLLIMPORT bool enable_presorted_aggregate;
extern PGDLLIMPORT bool enable_async_append;
+extern PGDLLIMPORT bool enable_async_merge_append;
extern PGDLLIMPORT int constraint_exclusion;
extern double index_pages_fetched(double tuples_fetched, BlockNumber pages,
diff --git a/src/test/regress/expected/sysviews.out b/src/test/regress/expected/sysviews.out
index 0411db832f1..194b1f95289 100644
--- a/src/test/regress/expected/sysviews.out
+++ b/src/test/regress/expected/sysviews.out
@@ -149,6 +149,7 @@ select name, setting from pg_settings where name like 'enable%';
name | setting
--------------------------------+---------
enable_async_append | on
+ enable_async_merge_append | on
enable_bitmapscan | on
enable_distinct_reordering | on
enable_eager_aggregate | on
@@ -173,7 +174,7 @@ select name, setting from pg_settings where name like 'enable%';
enable_seqscan | on
enable_sort | on
enable_tidscan | on
-(25 rows)
+(26 rows)
-- There are always wait event descriptions for various types. InjectionPoint
-- may be present or absent, depending on history since last postmaster start.
--
2.51.2
[text/plain] v12-0003-Common-infrastructure-to-MergeAppend-and-Append-.patch (87.3K, 4-v12-0003-Common-infrastructure-to-MergeAppend-and-Append-.patch)
download | inline diff:
From f26775222d238441a5bd2fa60757d09cf50fd2cd Mon Sep 17 00:00:00 2001
From: Matheus Alcantara <[email protected]>
Date: Tue, 16 Dec 2025 16:32:14 -0300
Subject: [PATCH v12 3/3] Common infrastructure to MergeAppend and Append nodes
This commit introduces the execAppend.c file to centralize executor
operations that can be shared between MergeAppend and Append nodes.
The logic for initializing and ending the nodes, as well as initializing
asynchronous execution, is very similar for both MergeAppend and Append.
To reduce redundancy, the duplicated code has been moved to execAppend.c
so that these nodes can reuse the common implementation.
The code responsible for actually fetching the tuples remains specific
to each node and was not refactored into execAppend.c to preserve the
unique execution requirements of each node type.
---
contrib/pg_overexplain/pg_overexplain.c | 4 +-
contrib/postgres_fdw/postgres_fdw.c | 8 +-
src/backend/commands/explain.c | 26 +-
src/backend/executor/Makefile | 1 +
src/backend/executor/execAmi.c | 2 +-
src/backend/executor/execAppend.c | 409 +++++++++++++++++++
src/backend/executor/execCurrent.c | 4 +-
src/backend/executor/execProcnode.c | 8 +-
src/backend/executor/meson.build | 1 +
src/backend/executor/nodeAppend.c | 497 +++++-------------------
src/backend/executor/nodeMergeAppend.c | 416 +++-----------------
src/backend/nodes/nodeFuncs.c | 8 +-
src/backend/optimizer/plan/createplan.c | 34 +-
src/backend/optimizer/plan/setrefs.c | 44 +--
src/backend/optimizer/plan/subselect.c | 4 +-
src/backend/utils/adt/ruleutils.c | 8 +-
src/include/executor/execAppend.h | 60 +++
src/include/nodes/execnodes.h | 105 ++---
src/include/nodes/plannodes.h | 45 +--
19 files changed, 746 insertions(+), 938 deletions(-)
create mode 100644 src/backend/executor/execAppend.c
create mode 100644 src/include/executor/execAppend.h
diff --git a/contrib/pg_overexplain/pg_overexplain.c b/contrib/pg_overexplain/pg_overexplain.c
index fcdc17012da..7f18c2ab06c 100644
--- a/contrib/pg_overexplain/pg_overexplain.c
+++ b/contrib/pg_overexplain/pg_overexplain.c
@@ -228,12 +228,12 @@ overexplain_per_node_hook(PlanState *planstate, List *ancestors,
break;
case T_Append:
overexplain_bitmapset("Append RTIs",
- ((Append *) plan)->apprelids,
+ ((Append *) plan)->ap.apprelids,
es);
break;
case T_MergeAppend:
overexplain_bitmapset("Append RTIs",
- ((MergeAppend *) plan)->apprelids,
+ ((MergeAppend *) plan)->ap.apprelids,
es);
break;
case T_Result:
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index bd551a1db72..b01ad40ad17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -2412,8 +2412,8 @@ find_modifytable_subplan(PlannerInfo *root,
{
Append *appendplan = (Append *) subplan;
- if (subplan_index < list_length(appendplan->appendplans))
- subplan = (Plan *) list_nth(appendplan->appendplans, subplan_index);
+ if (subplan_index < list_length(appendplan->ap.subplans))
+ subplan = (Plan *) list_nth(appendplan->ap.subplans, subplan_index);
}
else if (IsA(subplan, Result) &&
outerPlan(subplan) != NULL &&
@@ -2421,8 +2421,8 @@ find_modifytable_subplan(PlannerInfo *root,
{
Append *appendplan = (Append *) outerPlan(subplan);
- if (subplan_index < list_length(appendplan->appendplans))
- subplan = (Plan *) list_nth(appendplan->appendplans, subplan_index);
+ if (subplan_index < list_length(appendplan->ap.subplans))
+ subplan = (Plan *) list_nth(appendplan->ap.subplans, subplan_index);
}
/* Now, have we got a ForeignScan on the desired rel? */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5a6390631eb..3eaa1f7459e 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1224,11 +1224,11 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
break;
case T_Append:
*rels_used = bms_add_members(*rels_used,
- ((Append *) plan)->apprelids);
+ ((Append *) plan)->ap.apprelids);
break;
case T_MergeAppend:
*rels_used = bms_add_members(*rels_used,
- ((MergeAppend *) plan)->apprelids);
+ ((MergeAppend *) plan)->ap.apprelids);
break;
case T_Result:
*rels_used = bms_add_members(*rels_used,
@@ -1272,7 +1272,7 @@ plan_is_disabled(Plan *plan)
* includes any run-time pruned children. Ignoring those could give
* us the incorrect number of disabled nodes.
*/
- foreach(lc, aplan->appendplans)
+ foreach(lc, aplan->ap.subplans)
{
Plan *subplan = lfirst(lc);
@@ -1289,7 +1289,7 @@ plan_is_disabled(Plan *plan)
* includes any run-time pruned children. Ignoring those could give
* us the incorrect number of disabled nodes.
*/
- foreach(lc, maplan->mergeplans)
+ foreach(lc, maplan->ap.subplans)
{
Plan *subplan = lfirst(lc);
@@ -2336,13 +2336,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
switch (nodeTag(plan))
{
case T_Append:
- ExplainMissingMembers(((AppendState *) planstate)->as_nplans,
- list_length(((Append *) plan)->appendplans),
+ ExplainMissingMembers(((AppendState *) planstate)->as.nplans,
+ list_length(((Append *) plan)->ap.subplans),
es);
break;
case T_MergeAppend:
- ExplainMissingMembers(((MergeAppendState *) planstate)->ms_nplans,
- list_length(((MergeAppend *) plan)->mergeplans),
+ ExplainMissingMembers(((MergeAppendState *) planstate)->ms.nplans,
+ list_length(((MergeAppend *) plan)->ap.subplans),
es);
break;
default:
@@ -2386,13 +2386,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
switch (nodeTag(plan))
{
case T_Append:
- ExplainMemberNodes(((AppendState *) planstate)->appendplans,
- ((AppendState *) planstate)->as_nplans,
+ ExplainMemberNodes(((AppendState *) planstate)->as.plans,
+ ((AppendState *) planstate)->as.nplans,
ancestors, es);
break;
case T_MergeAppend:
- ExplainMemberNodes(((MergeAppendState *) planstate)->mergeplans,
- ((MergeAppendState *) planstate)->ms_nplans,
+ ExplainMemberNodes(((MergeAppendState *) planstate)->ms.plans,
+ ((MergeAppendState *) planstate)->ms.nplans,
ancestors, es);
break;
case T_BitmapAnd:
@@ -2606,7 +2606,7 @@ static void
show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
ExplainState *es)
{
- MergeAppend *plan = (MergeAppend *) mstate->ps.plan;
+ MergeAppend *plan = (MergeAppend *) mstate->ms.ps.plan;
show_sort_group_keys((PlanState *) mstate, "Sort Key",
plan->numCols, 0, plan->sortColIdx,
diff --git a/src/backend/executor/Makefile b/src/backend/executor/Makefile
index 11118d0ce02..66b62fca921 100644
--- a/src/backend/executor/Makefile
+++ b/src/backend/executor/Makefile
@@ -15,6 +15,7 @@ include $(top_builddir)/src/Makefile.global
OBJS = \
execAmi.o \
execAsync.o \
+ execAppend.o \
execCurrent.o \
execExpr.o \
execExprInterp.o \
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 1d0e8ad57b4..5c897048ba3 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -537,7 +537,7 @@ ExecSupportsBackwardScan(Plan *node)
if (((Append *) node)->nasyncplans > 0)
return false;
- foreach(l, ((Append *) node)->appendplans)
+ foreach(l, ((Append *) node)->ap.subplans)
{
if (!ExecSupportsBackwardScan((Plan *) lfirst(l)))
return false;
diff --git a/src/backend/executor/execAppend.c b/src/backend/executor/execAppend.c
new file mode 100644
index 00000000000..b7222ab7940
--- /dev/null
+++ b/src/backend/executor/execAppend.c
@@ -0,0 +1,409 @@
+/*-------------------------------------------------------------------------
+ *
+ * execAppend.c
+ * This code provides support functions for executing MergeAppend and Append
+ * nodes.
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ * src/backend/executor/execAppend.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+#include "executor/executor.h"
+#include "executor/execAppend.h"
+#include "executor/execAsync.h"
+#include "executor/execPartition.h"
+#include "storage/latch.h"
+#include "storage/waiteventset.h"
+#include "miscadmin.h"
+
+#define EVENT_BUFFER_SIZE 16
+
+/* Begin all of the subscans of an Appender node. */
+void
+ExecInitAppender(AppenderState * state,
+ Appender * node,
+ EState *estate,
+ int eflags,
+ int first_partial_plan,
+ int *first_valid_partial_plan)
+{
+ PlanState **appendplanstates;
+ const TupleTableSlotOps *appendops;
+ Bitmapset *validsubplans;
+ Bitmapset *asyncplans;
+ int nplans;
+ int nasyncplans;
+ int firstvalid;
+ int i,
+ j;
+
+ /* If run-time partition pruning is enabled, then set that up now */
+ if (node->part_prune_index >= 0)
+ {
+ PartitionPruneState *prunestate;
+
+ /*
+ * Set up pruning data structure. This also initializes the set of
+ * subplans to initialize (validsubplans) by taking into account the
+ * result of performing initial pruning if any.
+ */
+ prunestate = ExecInitPartitionExecPruning(&state->ps,
+ list_length(node->subplans),
+ node->part_prune_index,
+ node->apprelids,
+ &validsubplans);
+ state->prune_state = prunestate;
+ nplans = bms_num_members(validsubplans);
+
+ /*
+ * When no run-time pruning is required and there's at least one
+ * subplan, we can fill valid_subplans immediately, preventing
+ * later calls to ExecFindMatchingSubPlans.
+ */
+ if (!prunestate->do_exec_prune && nplans > 0)
+ {
+ state->valid_subplans = bms_add_range(NULL, 0, nplans - 1);
+ state->valid_subplans_identified = true;
+ }
+ }
+ else
+ {
+ nplans = list_length(node->subplans);
+
+ /*
+ * When run-time partition pruning is not enabled we can just mark all
+ * subplans as valid; they must also all be initialized.
+ */
+ Assert(nplans > 0);
+ state->valid_subplans = validsubplans =
+ bms_add_range(NULL, 0, nplans - 1);
+ state->valid_subplans_identified = true;
+ state->prune_state = NULL;
+ }
+
+ appendplanstates = palloc0_array(PlanState *, nplans);
+
+ /*
+ * call ExecInitNode on each of the valid plans to be executed and save
+ * the results into the appendplanstates array.
+ *
+ * While at it, find out the first valid partial plan.
+ */
+ j = 0;
+ asyncplans = NULL;
+ nasyncplans = 0;
+ firstvalid = nplans;
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *initNode = (Plan *) list_nth(node->subplans, i);
+
+ /*
+ * Record async subplans. When executing EvalPlanQual, we treat them
+ * as sync ones; don't do this when initializing an EvalPlanQual plan
+ * tree.
+ */
+ if (initNode->async_capable && estate->es_epq_active == NULL)
+ {
+ asyncplans = bms_add_member(asyncplans, j);
+ nasyncplans++;
+ }
+
+ /*
+ * Record the lowest appendplans index which is a valid partial plan.
+ */
+ if (first_valid_partial_plan && i >= first_partial_plan && j < firstvalid)
+ firstvalid = j;
+
+ appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
+ }
+
+ if (first_valid_partial_plan)
+ *first_valid_partial_plan = firstvalid;
+
+ state->plans = appendplanstates;
+ state->nplans = nplans;
+
+ /*
+ * Initialize Append's result tuple type and slot. If the child plans all
+ * produce the same fixed slot type, we can use that slot type; otherwise
+ * make a virtual slot. (Note that the result slot itself is used only to
+ * return a null tuple at end of execution; real tuples are returned to
+ * the caller in the children's own result slots. What we are doing here
+ * is allowing the parent plan node to optimize if the Append will return
+ * only one kind of slot.)
+ */
+ appendops = ExecGetCommonSlotOps(appendplanstates, j);
+ if (appendops != NULL)
+ {
+ ExecInitResultTupleSlotTL(&state->ps, appendops);
+ }
+ else
+ {
+ ExecInitResultTupleSlotTL(&state->ps, &TTSOpsVirtual);
+ /* show that the output slot type is not fixed */
+ state->ps.resultopsset = true;
+ state->ps.resultopsfixed = false;
+ }
+
+ /* Initialize async state */
+ state->asyncplans = asyncplans;
+ state->nasyncplans = nasyncplans;
+ state->asyncrequests = NULL;
+ state->asyncresults = NULL;
+ state->needrequest = NULL;
+ state->eventset = NULL;
+ state->valid_asyncplans = NULL;
+
+ if (nasyncplans > 0)
+ {
+ state->asyncrequests = palloc0_array(AsyncRequest *, nplans);
+
+ i = -1;
+ while ((i = bms_next_member(asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq;
+
+ areq = palloc_object(AsyncRequest);
+ areq->requestor = (PlanState *) state;
+ areq->requestee = appendplanstates[i];
+ areq->request_index = i;
+ areq->callback_pending = false;
+ areq->request_complete = false;
+ areq->result = NULL;
+
+ state->asyncrequests[i] = areq;
+ }
+
+ /*
+ * AppendState and MergeAppendState have slightly different allocation
+ * sizes for asyncresults in the original code, but we unify to the
+ * larger requirement or specific nplans if required.
+ */
+ state->asyncresults = palloc0_array(TupleTableSlot *, nplans);
+ }
+
+ /*
+ * Miscellaneous initialization
+ */
+ state->ps.ps_ProjInfo = NULL;
+}
+
+void
+ExecReScanAppender(AppenderState * node)
+{
+ int i;
+ int nasyncplans = node->nasyncplans;
+
+ /*
+ * If any PARAM_EXEC Params used in pruning expressions have changed, then
+ * we'd better unset the valid subplans so that they are reselected for
+ * the new parameter values.
+ */
+ if (node->prune_state &&
+ bms_overlap(node->ps.chgParam,
+ node->prune_state->execparamids))
+ {
+ node->valid_subplans_identified = false;
+ bms_free(node->valid_subplans);
+ node->valid_subplans = NULL;
+ bms_free(node->valid_asyncplans);
+ node->valid_asyncplans = NULL;
+ }
+
+ for (i = 0; i < node->nplans; i++)
+ {
+ PlanState *subnode = node->plans[i];
+
+ /*
+ * ExecReScan doesn't know about my subplans, so I have to do
+ * changed-parameter signaling myself.
+ */
+ if (node->ps.chgParam != NULL)
+ UpdateChangedParamSet(subnode, node->ps.chgParam);
+
+ /*
+ * If chgParam of subnode is not null then plan will be re-scanned by
+ * first ExecProcNode.
+ */
+ if (subnode->chgParam == NULL)
+ ExecReScan(subnode);
+ }
+
+ /* Reset async state */
+ if (nasyncplans > 0)
+ {
+ i = -1;
+ while ((i = bms_next_member(node->asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq = node->asyncrequests[i];
+
+ areq->callback_pending = false;
+ areq->request_complete = false;
+ areq->result = NULL;
+ }
+
+ bms_free(node->needrequest);
+ node->needrequest = NULL;
+ }
+}
+
+/* Wait or poll for file descriptor events and fire callbacks. */
+void
+ExecAppenderAsyncEventWait(AppenderState * node, int timeout, uint32 wait_event_info)
+{
+ int nevents = node->nasyncplans + 2; /* one for PM death and
+ * one for latch */
+ int noccurred;
+ int i;
+ WaitEvent occurred_event[EVENT_BUFFER_SIZE];
+
+ Assert(node->eventset == NULL);
+
+ node->eventset = CreateWaitEventSet(CurrentResourceOwner, nevents);
+ AddWaitEventToSet(node->eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
+ NULL, NULL);
+
+ /* Give each waiting subplan a chance to add an event. */
+ i = -1;
+ while ((i = bms_next_member(node->asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq = node->asyncrequests[i];
+
+ if (areq->callback_pending)
+ ExecAsyncConfigureWait(areq);
+ }
+
+ /*
+ * No need for further processing if none of the subplans configured any
+ * events.
+ */
+ if (GetNumRegisteredWaitEvents(node->eventset) == 1)
+ {
+ FreeWaitEventSet(node->eventset);
+ node->eventset = NULL;
+ return;
+ }
+
+ /*
+ * Add the process latch to the set, so that we wake up to process the
+ * standard interrupts with CHECK_FOR_INTERRUPTS().
+ *
+ * NOTE: For historical reasons, it's important that this is added to the
+ * WaitEventSet after the ExecAsyncConfigureWait() calls. Namely,
+ * postgres_fdw calls "GetNumRegisteredWaitEvents(set) == 1" to check if
+ * any other events are in the set. That's a poor design, it's
+ * questionable for postgres_fdw to be doing that in the first place, but
+ * we cannot change it now. The pattern has possibly been copied to other
+ * extensions too.
+ */
+ AddWaitEventToSet(node->eventset, WL_LATCH_SET, PGINVALID_SOCKET,
+ MyLatch, NULL);
+
+ /* Return at most EVENT_BUFFER_SIZE events in one call. */
+ if (nevents > EVENT_BUFFER_SIZE)
+ nevents = EVENT_BUFFER_SIZE;
+
+ /* Wait until at least one event occurs. */
+ noccurred = WaitEventSetWait(node->eventset, timeout, occurred_event,
+ nevents, wait_event_info);
+
+
+ FreeWaitEventSet(node->eventset);
+ node->eventset = NULL;
+ if (noccurred == 0)
+ return;
+
+
+ /* Deliver notifications. */
+ for (i = 0; i < noccurred; i++)
+ {
+ WaitEvent *w = &occurred_event[i];
+
+ /*
+ * Each waiting subplan should have registered its wait event with
+ * user_data pointing back to its AsyncRequest.
+ */
+ if ((w->events & WL_SOCKET_READABLE) != 0)
+ {
+ AsyncRequest *areq = (AsyncRequest *) w->user_data;
+
+ if (areq->callback_pending)
+ {
+ /*
+ * Mark it as no longer needing a callback. We must do this
+ * before dispatching the callback in case the callback resets
+ * the flag.
+ */
+ areq->callback_pending = false;
+
+ /* Do the actual work. */
+ ExecAsyncNotify(areq);
+ }
+ }
+
+ /* Handle standard interrupts */
+ if ((w->events & WL_LATCH_SET) != 0)
+ {
+ ResetLatch(MyLatch);
+ CHECK_FOR_INTERRUPTS();
+ }
+ }
+}
+
+/* Begin executing async-capable subplans. */
+void
+ExecAppenderAsyncBegin(AppenderState * node)
+{
+ int i;
+
+ /* Backward scan is not supported by async-aware Appends. */
+ Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+
+ /* We should never be called when there are no subplans */
+ Assert(node->nplans > 0);
+
+ /* We should never be called when there are no async subplans. */
+ Assert(node->nasyncplans > 0);
+
+ /* Make a request for each of the valid async subplans. */
+ i = -1;
+ while ((i = bms_next_member(node->valid_asyncplans, i)) >= 0)
+ {
+ AsyncRequest *areq = node->asyncrequests[i];
+
+ Assert(areq->request_index == i);
+ Assert(!areq->callback_pending);
+
+ /* Do the actual work. */
+ ExecAsyncRequest(areq);
+ }
+}
+
+/* Shuts down the subplans of an Appender node. */
+void
+ExecEndAppender(AppenderState * node)
+{
+ PlanState **subplans;
+ int nplans;
+ int i;
+
+ /*
+ * get information from the node
+ */
+ subplans = node->plans;
+ nplans = node->nplans;
+
+ /*
+ * shut down each of the subscans
+ */
+ for (i = 0; i < nplans; i++)
+ ExecEndNode(subplans[i]);
+}
+
diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index 3bfdc0230ff..e8cf2ead8a8 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -375,9 +375,9 @@ search_plan_tree(PlanState *node, Oid table_oid,
AppendState *astate = (AppendState *) node;
int i;
- for (i = 0; i < astate->as_nplans; i++)
+ for (i = 0; i < astate->as.nplans; i++)
{
- ScanState *elem = search_plan_tree(astate->appendplans[i],
+ ScanState *elem = search_plan_tree(astate->as.plans[i],
table_oid,
pending_rescan);
diff --git a/src/backend/executor/execProcnode.c b/src/backend/executor/execProcnode.c
index f5f9cfbeead..3eb1de1cd30 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -910,8 +910,8 @@ ExecSetTupleBound(int64 tuples_needed, PlanState *child_node)
AppendState *aState = (AppendState *) child_node;
int i;
- for (i = 0; i < aState->as_nplans; i++)
- ExecSetTupleBound(tuples_needed, aState->appendplans[i]);
+ for (i = 0; i < aState->as.nplans; i++)
+ ExecSetTupleBound(tuples_needed, aState->as.plans[i]);
}
else if (IsA(child_node, MergeAppendState))
{
@@ -923,8 +923,8 @@ ExecSetTupleBound(int64 tuples_needed, PlanState *child_node)
MergeAppendState *maState = (MergeAppendState *) child_node;
int i;
- for (i = 0; i < maState->ms_nplans; i++)
- ExecSetTupleBound(tuples_needed, maState->mergeplans[i]);
+ for (i = 0; i < maState->ms.nplans; i++)
+ ExecSetTupleBound(tuples_needed, maState->ms.plans[i]);
}
else if (IsA(child_node, ResultState))
{
diff --git a/src/backend/executor/meson.build b/src/backend/executor/meson.build
index 2cea41f8771..b5cb710a59f 100644
--- a/src/backend/executor/meson.build
+++ b/src/backend/executor/meson.build
@@ -3,6 +3,7 @@
backend_sources += files(
'execAmi.c',
'execAsync.c',
+ 'execAppend.c',
'execCurrent.c',
'execExpr.c',
'execExprInterp.c',
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index dfbc7b510c4..5c39ee275d2 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -57,13 +57,13 @@
#include "postgres.h"
+#include "executor/execAppend.h"
#include "executor/execAsync.h"
#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAppend.h"
#include "miscadmin.h"
#include "pgstat.h"
-#include "storage/latch.h"
/* Shared state for parallel-aware Append. */
struct ParallelAppendState
@@ -109,15 +109,6 @@ AppendState *
ExecInitAppend(Append *node, EState *estate, int eflags)
{
AppendState *appendstate = makeNode(AppendState);
- PlanState **appendplanstates;
- const TupleTableSlotOps *appendops;
- Bitmapset *validsubplans;
- Bitmapset *asyncplans;
- int nplans;
- int nasyncplans;
- int firstvalid;
- int i,
- j;
/* check for unsupported flags */
Assert(!(eflags & EXEC_FLAG_MARK));
@@ -125,167 +116,27 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
/*
* create new AppendState for our append node
*/
- appendstate->ps.plan = (Plan *) node;
- appendstate->ps.state = estate;
- appendstate->ps.ExecProcNode = ExecAppend;
+ appendstate->as.ps.plan = (Plan *) node;
+ appendstate->as.ps.state = estate;
+ appendstate->as.ps.ExecProcNode = ExecAppend;
/* Let choose_next_subplan_* function handle setting the first subplan */
appendstate->as_whichplan = INVALID_SUBPLAN_INDEX;
appendstate->as_syncdone = false;
appendstate->as_begun = false;
- /* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_index >= 0)
- {
- PartitionPruneState *prunestate;
-
- /*
- * Set up pruning data structure. This also initializes the set of
- * subplans to initialize (validsubplans) by taking into account the
- * result of performing initial pruning if any.
- */
- prunestate = ExecInitPartitionExecPruning(&appendstate->ps,
- list_length(node->appendplans),
- node->part_prune_index,
- node->apprelids,
- &validsubplans);
- appendstate->as_prune_state = prunestate;
- nplans = bms_num_members(validsubplans);
-
- /*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill as_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
- */
- if (!prunestate->do_exec_prune && nplans > 0)
- {
- appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
- appendstate->as_valid_subplans_identified = true;
- }
- }
- else
- {
- nplans = list_length(node->appendplans);
-
- /*
- * When run-time partition pruning is not enabled we can just mark all
- * subplans as valid; they must also all be initialized.
- */
- Assert(nplans > 0);
- appendstate->as_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
- appendstate->as_valid_subplans_identified = true;
- appendstate->as_prune_state = NULL;
- }
-
- appendplanstates = (PlanState **) palloc(nplans *
- sizeof(PlanState *));
-
- /*
- * call ExecInitNode on each of the valid plans to be executed and save
- * the results into the appendplanstates array.
- *
- * While at it, find out the first valid partial plan.
- */
- j = 0;
- asyncplans = NULL;
- nasyncplans = 0;
- firstvalid = nplans;
- i = -1;
- while ((i = bms_next_member(validsubplans, i)) >= 0)
- {
- Plan *initNode = (Plan *) list_nth(node->appendplans, i);
-
- /*
- * Record async subplans. When executing EvalPlanQual, we treat them
- * as sync ones; don't do this when initializing an EvalPlanQual plan
- * tree.
- */
- if (initNode->async_capable && estate->es_epq_active == NULL)
- {
- asyncplans = bms_add_member(asyncplans, j);
- nasyncplans++;
- }
-
- /*
- * Record the lowest appendplans index which is a valid partial plan.
- */
- if (i >= node->first_partial_plan && j < firstvalid)
- firstvalid = j;
-
- appendplanstates[j++] = ExecInitNode(initNode, estate, eflags);
- }
-
- appendstate->as_first_partial_plan = firstvalid;
- appendstate->appendplans = appendplanstates;
- appendstate->as_nplans = nplans;
+ /* Initialize common fields */
+ ExecInitAppender(&appendstate->as,
+ &node->ap,
+ estate,
+ eflags,
+ node->first_partial_plan,
+ &appendstate->as_first_partial_plan);
- /*
- * Initialize Append's result tuple type and slot. If the child plans all
- * produce the same fixed slot type, we can use that slot type; otherwise
- * make a virtual slot. (Note that the result slot itself is used only to
- * return a null tuple at end of execution; real tuples are returned to
- * the caller in the children's own result slots. What we are doing here
- * is allowing the parent plan node to optimize if the Append will return
- * only one kind of slot.)
- */
- appendops = ExecGetCommonSlotOps(appendplanstates, j);
- if (appendops != NULL)
- {
- ExecInitResultTupleSlotTL(&appendstate->ps, appendops);
- }
- else
- {
- ExecInitResultTupleSlotTL(&appendstate->ps, &TTSOpsVirtual);
- /* show that the output slot type is not fixed */
- appendstate->ps.resultopsset = true;
- appendstate->ps.resultopsfixed = false;
- }
+ if (appendstate->as.nasyncplans > 0 && appendstate->as.valid_subplans_identified)
+ classify_matching_subplans(appendstate);
- /* Initialize async state */
- appendstate->as_asyncplans = asyncplans;
- appendstate->as_nasyncplans = nasyncplans;
- appendstate->as_asyncrequests = NULL;
- appendstate->as_asyncresults = NULL;
- appendstate->as_nasyncresults = 0;
appendstate->as_nasyncremain = 0;
- appendstate->as_needrequest = NULL;
- appendstate->as_eventset = NULL;
- appendstate->as_valid_asyncplans = NULL;
-
- if (nasyncplans > 0)
- {
- appendstate->as_asyncrequests = (AsyncRequest **)
- palloc0(nplans * sizeof(AsyncRequest *));
-
- i = -1;
- while ((i = bms_next_member(asyncplans, i)) >= 0)
- {
- AsyncRequest *areq;
-
- areq = palloc_object(AsyncRequest);
- areq->requestor = (PlanState *) appendstate;
- areq->requestee = appendplanstates[i];
- areq->request_index = i;
- areq->callback_pending = false;
- areq->request_complete = false;
- areq->result = NULL;
-
- appendstate->as_asyncrequests[i] = areq;
- }
-
- appendstate->as_asyncresults = (TupleTableSlot **)
- palloc0(nasyncplans * sizeof(TupleTableSlot *));
-
- if (appendstate->as_valid_subplans_identified)
- classify_matching_subplans(appendstate);
- }
-
- /*
- * Miscellaneous initialization
- */
-
- appendstate->ps.ps_ProjInfo = NULL;
/* For parallel query, this will be overridden later. */
appendstate->choose_next_subplan = choose_next_subplan_locally;
@@ -315,11 +166,11 @@ ExecAppend(PlanState *pstate)
Assert(!node->as_syncdone);
/* Nothing to do if there are no subplans */
- if (node->as_nplans == 0)
- return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->as.nplans == 0)
+ return ExecClearTuple(node->as.ps.ps_ResultTupleSlot);
/* If there are any async subplans, begin executing them. */
- if (node->as_nasyncplans > 0)
+ if (node->as.nasyncplans > 0)
ExecAppendAsyncBegin(node);
/*
@@ -327,11 +178,11 @@ ExecAppend(PlanState *pstate)
* proceeding.
*/
if (!node->choose_next_subplan(node) && node->as_nasyncremain == 0)
- return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ return ExecClearTuple(node->as.ps.ps_ResultTupleSlot);
Assert(node->as_syncdone ||
(node->as_whichplan >= 0 &&
- node->as_whichplan < node->as_nplans));
+ node->as_whichplan < node->as.nplans));
/* And we're initialized. */
node->as_begun = true;
@@ -346,19 +197,19 @@ ExecAppend(PlanState *pstate)
/*
* try to get a tuple from an async subplan if any
*/
- if (node->as_syncdone || !bms_is_empty(node->as_needrequest))
+ if (node->as_syncdone || !bms_is_empty(node->as.needrequest))
{
if (ExecAppendAsyncGetNext(node, &result))
return result;
Assert(!node->as_syncdone);
- Assert(bms_is_empty(node->as_needrequest));
+ Assert(bms_is_empty(node->as.needrequest));
}
/*
* figure out which sync subplan we are currently processing
*/
- Assert(node->as_whichplan >= 0 && node->as_whichplan < node->as_nplans);
- subnode = node->appendplans[node->as_whichplan];
+ Assert(node->as_whichplan >= 0 && node->as_whichplan < node->as.nplans);
+ subnode = node->as.plans[node->as_whichplan];
/*
* get a tuple from the subplan
@@ -385,7 +236,7 @@ ExecAppend(PlanState *pstate)
/* choose new sync subplan; if no sync/async subplans, we're done */
if (!node->choose_next_subplan(node) && node->as_nasyncremain == 0)
- return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ return ExecClearTuple(node->as.ps.ps_ResultTupleSlot);
}
}
@@ -400,81 +251,22 @@ ExecAppend(PlanState *pstate)
void
ExecEndAppend(AppendState *node)
{
- PlanState **appendplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- appendplans = node->appendplans;
- nplans = node->as_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(appendplans[i]);
+ ExecEndAppender(&node->as);
}
void
ExecReScanAppend(AppendState *node)
{
- int nasyncplans = node->as_nasyncplans;
- int i;
-
- /*
- * If any PARAM_EXEC Params used in pruning expressions have changed, then
- * we'd better unset the valid subplans so that they are reselected for
- * the new parameter values.
- */
- if (node->as_prune_state &&
- bms_overlap(node->ps.chgParam,
- node->as_prune_state->execparamids))
- {
- node->as_valid_subplans_identified = false;
- bms_free(node->as_valid_subplans);
- node->as_valid_subplans = NULL;
- bms_free(node->as_valid_asyncplans);
- node->as_valid_asyncplans = NULL;
- }
-
- for (i = 0; i < node->as_nplans; i++)
- {
- PlanState *subnode = node->appendplans[i];
- /*
- * ExecReScan doesn't know about my subplans, so I have to do
- * changed-parameter signaling myself.
- */
- if (node->ps.chgParam != NULL)
- UpdateChangedParamSet(subnode, node->ps.chgParam);
+ int nasyncplans = node->as.nasyncplans;
- /*
- * If chgParam of subnode is not null then plan will be re-scanned by
- * first ExecProcNode or by first ExecAsyncRequest.
- */
- if (subnode->chgParam == NULL)
- ExecReScan(subnode);
- }
+ ExecReScanAppender(&node->as);
- /* Reset async state */
+ /* Reset specific append async state */
if (nasyncplans > 0)
{
- i = -1;
- while ((i = bms_next_member(node->as_asyncplans, i)) >= 0)
- {
- AsyncRequest *areq = node->as_asyncrequests[i];
-
- areq->callback_pending = false;
- areq->request_complete = false;
- areq->result = NULL;
- }
-
node->as_nasyncresults = 0;
node->as_nasyncremain = 0;
- bms_free(node->as_needrequest);
- node->as_needrequest = NULL;
}
/* Let choose_next_subplan_* function handle setting the first subplan */
@@ -501,7 +293,7 @@ ExecAppendEstimate(AppendState *node,
{
node->pstate_len =
add_size(offsetof(ParallelAppendState, pa_finished),
- sizeof(bool) * node->as_nplans);
+ sizeof(bool) * node->as.nplans);
shm_toc_estimate_chunk(&pcxt->estimator, node->pstate_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
@@ -523,7 +315,7 @@ ExecAppendInitializeDSM(AppendState *node,
pstate = shm_toc_allocate(pcxt->toc, node->pstate_len);
memset(pstate, 0, node->pstate_len);
LWLockInitialize(&pstate->pa_lock, LWTRANCHE_PARALLEL_APPEND);
- shm_toc_insert(pcxt->toc, node->ps.plan->plan_node_id, pstate);
+ shm_toc_insert(pcxt->toc, node->as.ps.plan->plan_node_id, pstate);
node->as_pstate = pstate;
node->choose_next_subplan = choose_next_subplan_for_leader;
@@ -541,7 +333,7 @@ ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
ParallelAppendState *pstate = node->as_pstate;
pstate->pa_next_plan = 0;
- memset(pstate->pa_finished, 0, sizeof(bool) * node->as_nplans);
+ memset(pstate->pa_finished, 0, sizeof(bool) * node->as.nplans);
}
/* ----------------------------------------------------------------
@@ -554,7 +346,7 @@ ExecAppendReInitializeDSM(AppendState *node, ParallelContext *pcxt)
void
ExecAppendInitializeWorker(AppendState *node, ParallelWorkerContext *pwcxt)
{
- node->as_pstate = shm_toc_lookup(pwcxt->toc, node->ps.plan->plan_node_id, false);
+ node->as_pstate = shm_toc_lookup(pwcxt->toc, node->as.ps.plan->plan_node_id, false);
node->choose_next_subplan = choose_next_subplan_for_worker;
}
@@ -572,7 +364,7 @@ choose_next_subplan_locally(AppendState *node)
int nextplan;
/* We should never be called when there are no subplans */
- Assert(node->as_nplans > 0);
+ Assert(node->as.nplans > 0);
/* Nothing to do if syncdone */
if (node->as_syncdone)
@@ -587,33 +379,33 @@ choose_next_subplan_locally(AppendState *node)
*/
if (whichplan == INVALID_SUBPLAN_INDEX)
{
- if (node->as_nasyncplans > 0)
+ if (node->as.nasyncplans > 0)
{
/* We'd have filled as_valid_subplans already */
- Assert(node->as_valid_subplans_identified);
+ Assert(node->as.valid_subplans_identified);
}
- else if (!node->as_valid_subplans_identified)
+ else if (!node->as.valid_subplans_identified)
{
- node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
- node->as_valid_subplans_identified = true;
+ node->as.valid_subplans =
+ ExecFindMatchingSubPlans(node->as.prune_state, false, NULL);
+ node->as.valid_subplans_identified = true;
}
whichplan = -1;
}
/* Ensure whichplan is within the expected range */
- Assert(whichplan >= -1 && whichplan <= node->as_nplans);
+ Assert(whichplan >= -1 && whichplan <= node->as.nplans);
- if (ScanDirectionIsForward(node->ps.state->es_direction))
- nextplan = bms_next_member(node->as_valid_subplans, whichplan);
+ if (ScanDirectionIsForward(node->as.ps.state->es_direction))
+ nextplan = bms_next_member(node->as.valid_subplans, whichplan);
else
- nextplan = bms_prev_member(node->as_valid_subplans, whichplan);
+ nextplan = bms_prev_member(node->as.valid_subplans, whichplan);
if (nextplan < 0)
{
/* Set as_syncdone if in async mode */
- if (node->as_nasyncplans > 0)
+ if (node->as.nasyncplans > 0)
node->as_syncdone = true;
return false;
}
@@ -637,10 +429,10 @@ choose_next_subplan_for_leader(AppendState *node)
ParallelAppendState *pstate = node->as_pstate;
/* Backward scan is not supported by parallel-aware plans */
- Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+ Assert(ScanDirectionIsForward(node->as.ps.state->es_direction));
/* We should never be called when there are no subplans */
- Assert(node->as_nplans > 0);
+ Assert(node->as.nplans > 0);
LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
@@ -652,18 +444,18 @@ choose_next_subplan_for_leader(AppendState *node)
else
{
/* Start with last subplan. */
- node->as_whichplan = node->as_nplans - 1;
+ node->as_whichplan = node->as.nplans - 1;
/*
* If we've yet to determine the valid subplans then do so now. If
* run-time pruning is disabled then the valid subplans will always be
* set to all subplans.
*/
- if (!node->as_valid_subplans_identified)
+ if (!node->as.valid_subplans_identified)
{
- node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
- node->as_valid_subplans_identified = true;
+ node->as.valid_subplans =
+ ExecFindMatchingSubPlans(node->as.prune_state, false, NULL);
+ node->as.valid_subplans_identified = true;
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -719,10 +511,10 @@ choose_next_subplan_for_worker(AppendState *node)
ParallelAppendState *pstate = node->as_pstate;
/* Backward scan is not supported by parallel-aware plans */
- Assert(ScanDirectionIsForward(node->ps.state->es_direction));
+ Assert(ScanDirectionIsForward(node->as.ps.state->es_direction));
/* We should never be called when there are no subplans */
- Assert(node->as_nplans > 0);
+ Assert(node->as.nplans > 0);
LWLockAcquire(&pstate->pa_lock, LW_EXCLUSIVE);
@@ -735,11 +527,11 @@ choose_next_subplan_for_worker(AppendState *node)
* run-time pruning is disabled then the valid subplans will always be set
* to all subplans.
*/
- else if (!node->as_valid_subplans_identified)
+ else if (!node->as.valid_subplans_identified)
{
- node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
- node->as_valid_subplans_identified = true;
+ node->as.valid_subplans =
+ ExecFindMatchingSubPlans(node->as.prune_state, false, NULL);
+ node->as.valid_subplans_identified = true;
mark_invalid_subplans_as_finished(node);
}
@@ -759,7 +551,7 @@ choose_next_subplan_for_worker(AppendState *node)
{
int nextplan;
- nextplan = bms_next_member(node->as_valid_subplans,
+ nextplan = bms_next_member(node->as.valid_subplans,
pstate->pa_next_plan);
if (nextplan >= 0)
{
@@ -772,7 +564,7 @@ choose_next_subplan_for_worker(AppendState *node)
* Try looping back to the first valid partial plan, if there is
* one. If there isn't, arrange to bail out below.
*/
- nextplan = bms_next_member(node->as_valid_subplans,
+ nextplan = bms_next_member(node->as.valid_subplans,
node->as_first_partial_plan - 1);
pstate->pa_next_plan =
nextplan < 0 ? node->as_whichplan : nextplan;
@@ -797,7 +589,7 @@ choose_next_subplan_for_worker(AppendState *node)
/* Pick the plan we found, and advance pa_next_plan one more time. */
node->as_whichplan = pstate->pa_next_plan;
- pstate->pa_next_plan = bms_next_member(node->as_valid_subplans,
+ pstate->pa_next_plan = bms_next_member(node->as.valid_subplans,
pstate->pa_next_plan);
/*
@@ -806,7 +598,7 @@ choose_next_subplan_for_worker(AppendState *node)
*/
if (pstate->pa_next_plan < 0)
{
- int nextplan = bms_next_member(node->as_valid_subplans,
+ int nextplan = bms_next_member(node->as.valid_subplans,
node->as_first_partial_plan - 1);
if (nextplan >= 0)
@@ -848,16 +640,16 @@ mark_invalid_subplans_as_finished(AppendState *node)
Assert(node->as_pstate);
/* Shouldn't have been called when run-time pruning is not enabled */
- Assert(node->as_prune_state);
+ Assert(node->as.prune_state);
/* Nothing to do if all plans are valid */
- if (bms_num_members(node->as_valid_subplans) == node->as_nplans)
+ if (bms_num_members(node->as.valid_subplans) == node->as.nplans)
return;
/* Mark all non-valid plans as finished */
- for (i = 0; i < node->as_nplans; i++)
+ for (i = 0; i < node->as.nplans; i++)
{
- if (!bms_is_member(i, node->as_valid_subplans))
+ if (!bms_is_member(i, node->as.valid_subplans))
node->as_pstate->pa_finished[i] = true;
}
}
@@ -876,47 +668,25 @@ mark_invalid_subplans_as_finished(AppendState *node)
static void
ExecAppendAsyncBegin(AppendState *node)
{
- int i;
-
- /* Backward scan is not supported by async-aware Appends. */
- Assert(ScanDirectionIsForward(node->ps.state->es_direction));
-
- /* We should never be called when there are no subplans */
- Assert(node->as_nplans > 0);
-
- /* We should never be called when there are no async subplans. */
- Assert(node->as_nasyncplans > 0);
-
/* If we've yet to determine the valid subplans then do so now. */
- if (!node->as_valid_subplans_identified)
+ if (!node->as.valid_subplans_identified)
{
- node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
- node->as_valid_subplans_identified = true;
+ node->as.valid_subplans =
+ ExecFindMatchingSubPlans(node->as.prune_state, false, NULL);
+ node->as.valid_subplans_identified = true;
classify_matching_subplans(node);
}
/* Initialize state variables. */
- node->as_syncdone = bms_is_empty(node->as_valid_subplans);
- node->as_nasyncremain = bms_num_members(node->as_valid_asyncplans);
+ node->as_syncdone = bms_is_empty(node->as.valid_subplans);
+ node->as_nasyncremain = bms_num_members(node->as.valid_asyncplans);
/* Nothing to do if there are no valid async subplans. */
if (node->as_nasyncremain == 0)
return;
- /* Make a request for each of the valid async subplans. */
- i = -1;
- while ((i = bms_next_member(node->as_valid_asyncplans, i)) >= 0)
- {
- AsyncRequest *areq = node->as_asyncrequests[i];
-
- Assert(areq->request_index == i);
- Assert(!areq->callback_pending);
-
- /* Do the actual work. */
- ExecAsyncRequest(areq);
- }
+ ExecAppenderAsyncBegin(&node->as);
}
/* ----------------------------------------------------------------
@@ -961,7 +731,7 @@ ExecAppendAsyncGetNext(AppendState *node, TupleTableSlot **result)
if (node->as_syncdone)
{
Assert(node->as_nasyncremain == 0);
- *result = ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ *result = ExecClearTuple(node->as.ps.ps_ResultTupleSlot);
return true;
}
@@ -981,7 +751,7 @@ ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result)
int i;
/* Nothing to do if there are no async subplans needing a new request. */
- if (bms_is_empty(node->as_needrequest))
+ if (bms_is_empty(node->as.needrequest))
{
Assert(node->as_nasyncresults == 0);
return false;
@@ -994,17 +764,17 @@ ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result)
if (node->as_nasyncresults > 0)
{
--node->as_nasyncresults;
- *result = node->as_asyncresults[node->as_nasyncresults];
+ *result = node->as.asyncresults[node->as_nasyncresults];
return true;
}
/* Make a new request for each of the async subplans that need it. */
- needrequest = node->as_needrequest;
- node->as_needrequest = NULL;
+ needrequest = node->as.needrequest;
+ node->as.needrequest = NULL;
i = -1;
while ((i = bms_next_member(needrequest, i)) >= 0)
{
- AsyncRequest *areq = node->as_asyncrequests[i];
+ AsyncRequest *areq = node->as.asyncrequests[i];
/* Do the actual work. */
ExecAsyncRequest(areq);
@@ -1015,7 +785,7 @@ ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result)
if (node->as_nasyncresults > 0)
{
--node->as_nasyncresults;
- *result = node->as_asyncresults[node->as_nasyncresults];
+ *result = node->as.asyncresults[node->as_nasyncresults];
return true;
}
@@ -1031,105 +801,12 @@ ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result)
static void
ExecAppendAsyncEventWait(AppendState *node)
{
- int nevents = node->as_nasyncplans + 2;
long timeout = node->as_syncdone ? -1 : 0;
- WaitEvent occurred_event[EVENT_BUFFER_SIZE];
- int noccurred;
- int i;
/* We should never be called when there are no valid async subplans. */
Assert(node->as_nasyncremain > 0);
- Assert(node->as_eventset == NULL);
- node->as_eventset = CreateWaitEventSet(CurrentResourceOwner, nevents);
- AddWaitEventToSet(node->as_eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
- NULL, NULL);
-
- /* Give each waiting subplan a chance to add an event. */
- i = -1;
- while ((i = bms_next_member(node->as_asyncplans, i)) >= 0)
- {
- AsyncRequest *areq = node->as_asyncrequests[i];
-
- if (areq->callback_pending)
- ExecAsyncConfigureWait(areq);
- }
-
- /*
- * No need for further processing if none of the subplans configured any
- * events.
- */
- if (GetNumRegisteredWaitEvents(node->as_eventset) == 1)
- {
- FreeWaitEventSet(node->as_eventset);
- node->as_eventset = NULL;
- return;
- }
-
- /*
- * Add the process latch to the set, so that we wake up to process the
- * standard interrupts with CHECK_FOR_INTERRUPTS().
- *
- * NOTE: For historical reasons, it's important that this is added to the
- * WaitEventSet after the ExecAsyncConfigureWait() calls. Namely,
- * postgres_fdw calls "GetNumRegisteredWaitEvents(set) == 1" to check if
- * any other events are in the set. That's a poor design, it's
- * questionable for postgres_fdw to be doing that in the first place, but
- * we cannot change it now. The pattern has possibly been copied to other
- * extensions too.
- */
- AddWaitEventToSet(node->as_eventset, WL_LATCH_SET, PGINVALID_SOCKET,
- MyLatch, NULL);
-
- /* Return at most EVENT_BUFFER_SIZE events in one call. */
- if (nevents > EVENT_BUFFER_SIZE)
- nevents = EVENT_BUFFER_SIZE;
-
- /*
- * If the timeout is -1, wait until at least one event occurs. If the
- * timeout is 0, poll for events, but do not wait at all.
- */
- noccurred = WaitEventSetWait(node->as_eventset, timeout, occurred_event,
- nevents, WAIT_EVENT_APPEND_READY);
- FreeWaitEventSet(node->as_eventset);
- node->as_eventset = NULL;
- if (noccurred == 0)
- return;
-
- /* Deliver notifications. */
- for (i = 0; i < noccurred; i++)
- {
- WaitEvent *w = &occurred_event[i];
-
- /*
- * Each waiting subplan should have registered its wait event with
- * user_data pointing back to its AsyncRequest.
- */
- if ((w->events & WL_SOCKET_READABLE) != 0)
- {
- AsyncRequest *areq = (AsyncRequest *) w->user_data;
-
- if (areq->callback_pending)
- {
- /*
- * Mark it as no longer needing a callback. We must do this
- * before dispatching the callback in case the callback resets
- * the flag.
- */
- areq->callback_pending = false;
-
- /* Do the actual work. */
- ExecAsyncNotify(areq);
- }
- }
-
- /* Handle standard interrupts */
- if ((w->events & WL_LATCH_SET) != 0)
- {
- ResetLatch(MyLatch);
- CHECK_FOR_INTERRUPTS();
- }
- }
+ ExecAppenderAsyncEventWait(&node->as, timeout, WAIT_EVENT_APPEND_READY);
}
/* ----------------------------------------------------------------
@@ -1165,14 +842,14 @@ ExecAsyncAppendResponse(AsyncRequest *areq)
}
/* Save result so we can return it. */
- Assert(node->as_nasyncresults < node->as_nasyncplans);
- node->as_asyncresults[node->as_nasyncresults++] = slot;
+ Assert(node->as_nasyncresults < node->as.nasyncplans);
+ node->as.asyncresults[node->as_nasyncresults++] = slot;
/*
* Mark the subplan that returned a result as ready for a new request. We
* don't launch another one here immediately because it might complete.
*/
- node->as_needrequest = bms_add_member(node->as_needrequest,
+ node->as.needrequest = bms_add_member(node->as.needrequest,
areq->request_index);
}
@@ -1187,10 +864,10 @@ ExecAsyncAppendResponse(AsyncRequest *areq)
static void
classify_matching_subplans(AppendState *node)
{
- Assert(node->as_valid_subplans_identified);
+ Assert(node->as.valid_subplans_identified);
/* Nothing to do if there are no valid subplans. */
- if (bms_is_empty(node->as_valid_subplans))
+ if (bms_is_empty(node->as.valid_subplans))
{
node->as_syncdone = true;
node->as_nasyncremain = 0;
@@ -1199,8 +876,8 @@ classify_matching_subplans(AppendState *node)
/* No valid async subplans identified. */
if (!classify_matching_subplans_common(
- &node->as_valid_subplans,
- node->as_asyncplans,
- &node->as_valid_asyncplans))
+ &node->as.valid_subplans,
+ node->as.asyncplans,
+ &node->as.valid_asyncplans))
node->as_nasyncremain = 0;
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f1c267eb9eb..e1a207aeb85 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -38,6 +38,7 @@
#include "postgres.h"
+#include "executor/execAppend.h"
#include "executor/executor.h"
#include "executor/execAsync.h"
#include "executor/execPartition.h"
@@ -76,14 +77,7 @@ MergeAppendState *
ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
MergeAppendState *mergestate = makeNode(MergeAppendState);
- PlanState **mergeplanstates;
- const TupleTableSlotOps *mergeops;
- Bitmapset *validsubplans;
- int nplans;
- int i,
- j;
- Bitmapset *asyncplans;
- int nasyncplans;
+ int i;
/* check for unsupported flags */
Assert(!(eflags & (EXEC_FLAG_BACKWARD | EXEC_FLAG_MARK)));
@@ -91,154 +85,27 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/*
* create new MergeAppendState for our node
*/
- mergestate->ps.plan = (Plan *) node;
- mergestate->ps.state = estate;
- mergestate->ps.ExecProcNode = ExecMergeAppend;
-
- /* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_index >= 0)
- {
- PartitionPruneState *prunestate;
-
- /*
- * Set up pruning data structure. This also initializes the set of
- * subplans to initialize (validsubplans) by taking into account the
- * result of performing initial pruning if any.
- */
- prunestate = ExecInitPartitionExecPruning(&mergestate->ps,
- list_length(node->mergeplans),
- node->part_prune_index,
- node->apprelids,
- &validsubplans);
- mergestate->ms_prune_state = prunestate;
- nplans = bms_num_members(validsubplans);
-
- /*
- * When no run-time pruning is required and there's at least one
- * subplan, we can fill ms_valid_subplans immediately, preventing
- * later calls to ExecFindMatchingSubPlans.
- */
- if (!prunestate->do_exec_prune && nplans > 0)
- {
- mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
- mergestate->ms_valid_subplans_identified = true;
- }
- }
- else
- {
- nplans = list_length(node->mergeplans);
-
- /*
- * When run-time partition pruning is not enabled we can just mark all
- * subplans as valid; they must also all be initialized.
- */
- Assert(nplans > 0);
- mergestate->ms_valid_subplans = validsubplans =
- bms_add_range(NULL, 0, nplans - 1);
- mergestate->ms_valid_subplans_identified = true;
- mergestate->ms_prune_state = NULL;
- }
-
- mergeplanstates = palloc_array(PlanState *, nplans);
- mergestate->mergeplans = mergeplanstates;
- mergestate->ms_nplans = nplans;
-
- mergestate->ms_slots = palloc0_array(TupleTableSlot *, nplans);
- mergestate->ms_heap = binaryheap_allocate(nplans, heap_compare_slots,
+ mergestate->ms.ps.plan = (Plan *) node;
+ mergestate->ms.ps.state = estate;
+ mergestate->ms.ps.ExecProcNode = ExecMergeAppend;
+
+ /* Initialize common fields */
+ ExecInitAppender(&mergestate->ms,
+ &node->ap,
+ estate,
+ eflags,
+ -1,
+ NULL);
+
+ if (mergestate->ms.nasyncplans > 0 && mergestate->ms.valid_subplans_identified)
+ classify_matching_subplans(mergestate);
+
+ mergestate->ms_slots = palloc0_array(TupleTableSlot *, mergestate->ms.nplans);
+ mergestate->ms_heap = binaryheap_allocate(mergestate->ms.nplans, heap_compare_slots,
mergestate);
- /*
- * call ExecInitNode on each of the valid plans to be executed and save
- * the results into the mergeplanstates array.
- */
- j = 0;
- asyncplans = NULL;
- nasyncplans = 0;
-
- i = -1;
- while ((i = bms_next_member(validsubplans, i)) >= 0)
- {
- Plan *initNode = (Plan *) list_nth(node->mergeplans, i);
-
- /*
- * Record async subplans. When executing EvalPlanQual, we treat them
- * as sync ones; don't do this when initializing an EvalPlanQual plan
- * tree.
- */
- if (initNode->async_capable && estate->es_epq_active == NULL)
- {
- asyncplans = bms_add_member(asyncplans, j);
- nasyncplans++;
- }
-
- mergeplanstates[j++] = ExecInitNode(initNode, estate, eflags);
- }
-
- /*
- * Initialize MergeAppend's result tuple type and slot. If the child
- * plans all produce the same fixed slot type, we can use that slot type;
- * otherwise make a virtual slot. (Note that the result slot itself is
- * used only to return a null tuple at end of execution; real tuples are
- * returned to the caller in the children's own result slots. What we are
- * doing here is allowing the parent plan node to optimize if the
- * MergeAppend will return only one kind of slot.)
- */
- mergeops = ExecGetCommonSlotOps(mergeplanstates, j);
- if (mergeops != NULL)
- {
- ExecInitResultTupleSlotTL(&mergestate->ps, mergeops);
- }
- else
- {
- ExecInitResultTupleSlotTL(&mergestate->ps, &TTSOpsVirtual);
- /* show that the output slot type is not fixed */
- mergestate->ps.resultopsset = true;
- mergestate->ps.resultopsfixed = false;
- }
-
- /*
- * Miscellaneous initialization
- */
- mergestate->ps.ps_ProjInfo = NULL;
-
- /* Initialize async state */
- mergestate->ms_asyncplans = asyncplans;
- mergestate->ms_nasyncplans = nasyncplans;
- mergestate->ms_asyncrequests = NULL;
- mergestate->ms_asyncresults = NULL;
mergestate->ms_has_asyncresults = NULL;
mergestate->ms_asyncremain = NULL;
- mergestate->ms_needrequest = NULL;
- mergestate->ms_eventset = NULL;
- mergestate->ms_valid_asyncplans = NULL;
-
- if (nasyncplans > 0)
- {
- mergestate->ms_asyncrequests = (AsyncRequest **)
- palloc0(nplans * sizeof(AsyncRequest *));
-
- i = -1;
- while ((i = bms_next_member(asyncplans, i)) >= 0)
- {
- AsyncRequest *areq;
-
- areq = palloc(sizeof(AsyncRequest));
- areq->requestor = (PlanState *) mergestate;
- areq->requestee = mergeplanstates[i];
- areq->request_index = i;
- areq->callback_pending = false;
- areq->request_complete = false;
- areq->result = NULL;
-
- mergestate->ms_asyncrequests[i] = areq;
- }
-
- mergestate->ms_asyncresults = (TupleTableSlot **)
- palloc0(nplans * sizeof(TupleTableSlot *));
-
- if (mergestate->ms_valid_subplans_identified)
- classify_matching_subplans(mergestate);
- }
/*
* initialize sort-key information
@@ -293,20 +160,20 @@ ExecMergeAppend(PlanState *pstate)
if (!node->ms_initialized)
{
/* Nothing to do if all subplans were pruned */
- if (node->ms_nplans == 0)
- return ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ if (node->ms.nplans == 0)
+ return ExecClearTuple(node->ms.ps.ps_ResultTupleSlot);
/* If we've yet to determine the valid subplans then do so now. */
- if (!node->ms_valid_subplans_identified)
+ if (!node->ms.valid_subplans_identified)
{
- node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
- node->ms_valid_subplans_identified = true;
+ node->ms.valid_subplans =
+ ExecFindMatchingSubPlans(node->ms.prune_state, false, NULL);
+ node->ms.valid_subplans_identified = true;
classify_matching_subplans(node);
}
/* If there are any async subplans, begin executing them. */
- if (node->ms_nasyncplans > 0)
+ if (node->ms.nasyncplans > 0)
ExecMergeAppendAsyncBegin(node);
/*
@@ -314,16 +181,16 @@ ExecMergeAppend(PlanState *pstate)
* and set up the heap.
*/
i = -1;
- while ((i = bms_next_member(node->ms_valid_subplans, i)) >= 0)
+ while ((i = bms_next_member(node->ms.valid_subplans, i)) >= 0)
{
- node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
+ node->ms_slots[i] = ExecProcNode(node->ms.plans[i]);
if (!TupIsNull(node->ms_slots[i]))
binaryheap_add_unordered(node->ms_heap, Int32GetDatum(i));
}
/* Look at valid async subplans */
i = -1;
- while ((i = bms_next_member(node->ms_valid_asyncplans, i)) >= 0)
+ while ((i = bms_next_member(node->ms.valid_asyncplans, i)) >= 0)
{
ExecMergeAppendAsyncGetNext(node, i);
if (!TupIsNull(node->ms_slots[i]))
@@ -344,12 +211,12 @@ ExecMergeAppend(PlanState *pstate)
* to not pull tuples until necessary.)
*/
i = DatumGetInt32(binaryheap_first(node->ms_heap));
- if (bms_is_member(i, node->ms_asyncplans))
+ if (bms_is_member(i, node->ms.asyncplans))
ExecMergeAppendAsyncGetNext(node, i);
else
{
- Assert(bms_is_member(i, node->ms_valid_subplans));
- node->ms_slots[i] = ExecProcNode(node->mergeplans[i]);
+ Assert(bms_is_member(i, node->ms.valid_subplans));
+ node->ms_slots[i] = ExecProcNode(node->ms.plans[i]);
}
if (!TupIsNull(node->ms_slots[i]))
binaryheap_replace_first(node->ms_heap, Int32GetDatum(i));
@@ -360,7 +227,7 @@ ExecMergeAppend(PlanState *pstate)
if (binaryheap_empty(node->ms_heap))
{
/* All the subplans are exhausted, and so is the heap */
- result = ExecClearTuple(node->ps.ps_ResultTupleSlot);
+ result = ExecClearTuple(node->ms.ps.ps_ResultTupleSlot);
}
else
{
@@ -426,81 +293,21 @@ heap_compare_slots(Datum a, Datum b, void *arg)
void
ExecEndMergeAppend(MergeAppendState *node)
{
- PlanState **mergeplans;
- int nplans;
- int i;
-
- /*
- * get information from the node
- */
- mergeplans = node->mergeplans;
- nplans = node->ms_nplans;
-
- /*
- * shut down each of the subscans
- */
- for (i = 0; i < nplans; i++)
- ExecEndNode(mergeplans[i]);
+ ExecEndAppender(&node->ms);
}
void
ExecReScanMergeAppend(MergeAppendState *node)
{
- int i;
- int nasyncplans = node->ms_nasyncplans;
+ int nasyncplans = node->ms.nasyncplans;
- /*
- * If any PARAM_EXEC Params used in pruning expressions have changed, then
- * we'd better unset the valid subplans so that they are reselected for
- * the new parameter values.
- */
- if (node->ms_prune_state &&
- bms_overlap(node->ps.chgParam,
- node->ms_prune_state->execparamids))
- {
- node->ms_valid_subplans_identified = false;
- bms_free(node->ms_valid_subplans);
- node->ms_valid_subplans = NULL;
- bms_free(node->ms_valid_asyncplans);
- node->ms_valid_asyncplans = NULL;
- }
-
- for (i = 0; i < node->ms_nplans; i++)
- {
- PlanState *subnode = node->mergeplans[i];
-
- /*
- * ExecReScan doesn't know about my subplans, so I have to do
- * changed-parameter signaling myself.
- */
- if (node->ps.chgParam != NULL)
- UpdateChangedParamSet(subnode, node->ps.chgParam);
-
- /*
- * If chgParam of subnode is not null then plan will be re-scanned by
- * first ExecProcNode.
- */
- if (subnode->chgParam == NULL)
- ExecReScan(subnode);
- }
+ ExecReScanAppender(&node->ms);
- /* Reset async state */
+ /* Reset specific merge append async state */
if (nasyncplans > 0)
{
- i = -1;
- while ((i = bms_next_member(node->ms_asyncplans, i)) >= 0)
- {
- AsyncRequest *areq = node->ms_asyncrequests[i];
-
- areq->callback_pending = false;
- areq->request_complete = false;
- areq->result = NULL;
- }
-
bms_free(node->ms_asyncremain);
node->ms_asyncremain = NULL;
- bms_free(node->ms_needrequest);
- node->ms_needrequest = NULL;
bms_free(node->ms_has_asyncresults);
node->ms_has_asyncresults = NULL;
}
@@ -519,10 +326,10 @@ ExecReScanMergeAppend(MergeAppendState *node)
static void
classify_matching_subplans(MergeAppendState *node)
{
- Assert(node->ms_valid_subplans_identified);
+ Assert(node->ms.valid_subplans_identified);
/* Nothing to do if there are no valid subplans. */
- if (bms_is_empty(node->ms_valid_subplans))
+ if (bms_is_empty(node->ms.valid_subplans))
{
node->ms_asyncremain = NULL;
return;
@@ -530,9 +337,9 @@ classify_matching_subplans(MergeAppendState *node)
/* No valid async subplans identified. */
if (!classify_matching_subplans_common(
- &node->ms_valid_subplans,
- node->ms_asyncplans,
- &node->ms_valid_asyncplans))
+ &node->ms.valid_subplans,
+ node->ms.asyncplans,
+ &node->ms.valid_asyncplans))
node->ms_asyncremain = NULL;
}
@@ -545,39 +352,17 @@ classify_matching_subplans(MergeAppendState *node)
static void
ExecMergeAppendAsyncBegin(MergeAppendState *node)
{
- int i;
-
- /* Backward scan is not supported by async-aware MergeAppends. */
- Assert(ScanDirectionIsForward(node->ps.state->es_direction));
-
- /* We should never be called when there are no subplans */
- Assert(node->ms_nplans > 0);
-
- /* We should never be called when there are no async subplans. */
- Assert(node->ms_nasyncplans > 0);
-
/* ExecMergeAppend() identifies valid subplans */
- Assert(node->ms_valid_subplans_identified);
+ Assert(node->ms.valid_subplans_identified);
/* Initialize state variables. */
- node->ms_asyncremain = bms_copy(node->ms_valid_asyncplans);
+ node->ms_asyncremain = bms_copy(node->ms.valid_asyncplans);
/* Nothing to do if there are no valid async subplans. */
if (bms_is_empty(node->ms_asyncremain))
return;
- /* Make a request for each of the valid async subplans. */
- i = -1;
- while ((i = bms_next_member(node->ms_valid_asyncplans, i)) >= 0)
- {
- AsyncRequest *areq = node->ms_asyncrequests[i];
-
- Assert(areq->request_index == i);
- Assert(!areq->callback_pending);
-
- /* Do the actual work. */
- ExecAsyncRequest(areq);
- }
+ ExecAppenderAsyncBegin(&node->ms);
}
/* ----------------------------------------------------------------
@@ -638,7 +423,7 @@ ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan)
*/
if (bms_is_member(mplan, node->ms_has_asyncresults))
{
- node->ms_slots[mplan] = node->ms_asyncresults[mplan];
+ node->ms_slots[mplan] = node->ms.asyncresults[mplan];
return true;
}
@@ -648,7 +433,7 @@ ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan)
*/
needrequest = NULL;
i = -1;
- while ((i = bms_next_member(node->ms_needrequest, i)) >= 0)
+ while ((i = bms_next_member(node->ms.needrequest, i)) >= 0)
{
if (!bms_is_member(i, node->ms_has_asyncresults))
needrequest = bms_add_member(needrequest, i);
@@ -661,13 +446,13 @@ ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan)
return false;
/* Clear ms_needrequest flag, as we are going to send requests now */
- node->ms_needrequest = bms_del_members(node->ms_needrequest, needrequest);
+ node->ms.needrequest = bms_del_members(node->ms.needrequest, needrequest);
/* Make a new request for each of the async subplans that need it. */
i = -1;
while ((i = bms_next_member(needrequest, i)) >= 0)
{
- AsyncRequest *areq = node->ms_asyncrequests[i];
+ AsyncRequest *areq = node->ms.asyncrequests[i];
/*
* We've just checked that subplan doesn't already have some fetched
@@ -683,7 +468,7 @@ ExecMergeAppendAsyncRequest(MergeAppendState *node, int mplan)
/* Return needed asynchronously-generated results if any. */
if (bms_is_member(mplan, node->ms_has_asyncresults))
{
- node->ms_slots[mplan] = node->ms_asyncresults[mplan];
+ node->ms_slots[mplan] = node->ms.asyncresults[mplan];
return true;
}
@@ -707,7 +492,7 @@ ExecAsyncMergeAppendResponse(AsyncRequest *areq)
/* We should handle previous async result prior to getting new one */
Assert(!bms_is_member(areq->request_index, node->ms_has_asyncresults));
- node->ms_asyncresults[areq->request_index] = NULL;
+ node->ms.asyncresults[areq->request_index] = NULL;
/* Nothing to do if the request is pending. */
if (!areq->request_complete)
{
@@ -730,13 +515,13 @@ ExecAsyncMergeAppendResponse(AsyncRequest *areq)
node->ms_has_asyncresults = bms_add_member(node->ms_has_asyncresults,
areq->request_index);
/* Save result so we can return it. */
- node->ms_asyncresults[areq->request_index] = slot;
+ node->ms.asyncresults[areq->request_index] = slot;
/*
* Mark the subplan that returned a result as ready for a new request. We
* don't launch another one here immediately because it might complete.
*/
- node->ms_needrequest = bms_add_member(node->ms_needrequest,
+ node->ms.needrequest = bms_add_member(node->ms.needrequest,
areq->request_index);
}
@@ -749,101 +534,8 @@ ExecAsyncMergeAppendResponse(AsyncRequest *areq)
static void
ExecMergeAppendAsyncEventWait(MergeAppendState *node)
{
- int nevents = node->ms_nasyncplans + 2; /* one for PM death and
- * one for latch */
- WaitEvent occurred_event[EVENT_BUFFER_SIZE];
- int noccurred;
- int i;
-
/* We should never be called when there are no valid async subplans. */
Assert(bms_num_members(node->ms_asyncremain) > 0);
- node->ms_eventset = CreateWaitEventSet(CurrentResourceOwner, nevents);
- AddWaitEventToSet(node->ms_eventset, WL_EXIT_ON_PM_DEATH, PGINVALID_SOCKET,
- NULL, NULL);
-
- /* Give each waiting subplan a chance to add an event. */
- i = -1;
- while ((i = bms_next_member(node->ms_asyncplans, i)) >= 0)
- {
- AsyncRequest *areq = node->ms_asyncrequests[i];
-
- if (areq->callback_pending)
- ExecAsyncConfigureWait(areq);
- }
-
- /*
- * No need for further processing if none of the subplans configured any
- * events.
- */
- if (GetNumRegisteredWaitEvents(node->ms_eventset) == 1)
- {
- FreeWaitEventSet(node->ms_eventset);
- node->ms_eventset = NULL;
- return;
- }
-
- /*
- * Add the process latch to the set, so that we wake up to process the
- * standard interrupts with CHECK_FOR_INTERRUPTS().
- *
- * NOTE: For historical reasons, it's important that this is added to the
- * WaitEventSet after the ExecAsyncConfigureWait() calls. Namely,
- * postgres_fdw calls "GetNumRegisteredWaitEvents(set) == 1" to check if
- * any other events are in the set. That's a poor design, it's
- * questionable for postgres_fdw to be doing that in the first place, but
- * we cannot change it now. The pattern has possibly been copied to other
- * extensions too.
- */
- AddWaitEventToSet(node->ms_eventset, WL_LATCH_SET, PGINVALID_SOCKET,
- MyLatch, NULL);
-
- /* Return at most EVENT_BUFFER_SIZE events in one call. */
- if (nevents > EVENT_BUFFER_SIZE)
- nevents = EVENT_BUFFER_SIZE;
-
- /*
- * Wait until at least one event occurs.
- */
- noccurred = WaitEventSetWait(node->ms_eventset, -1 /* no timeout */ , occurred_event,
- nevents, WAIT_EVENT_APPEND_READY);
- FreeWaitEventSet(node->ms_eventset);
- node->ms_eventset = NULL;
- if (noccurred == 0)
- return;
-
- /* Deliver notifications. */
- for (i = 0; i < noccurred; i++)
- {
- WaitEvent *w = &occurred_event[i];
-
- /*
- * Each waiting subplan should have registered its wait event with
- * user_data pointing back to its AsyncRequest.
- */
- if ((w->events & WL_SOCKET_READABLE) != 0)
- {
- AsyncRequest *areq = (AsyncRequest *) w->user_data;
-
- if (areq->callback_pending)
- {
- /*
- * Mark it as no longer needing a callback. We must do this
- * before dispatching the callback in case the callback resets
- * the flag.
- */
- areq->callback_pending = false;
-
- /* Do the actual work. */
- ExecAsyncNotify(areq);
- }
- }
-
- /* Handle standard interrupts */
- if ((w->events & WL_LATCH_SET) != 0)
- {
- ResetLatch(MyLatch);
- CHECK_FOR_INTERRUPTS();
- }
- }
+ ExecAppenderAsyncEventWait(&node->ms, -1 /* no timeout */ , WAIT_EVENT_APPEND_READY);
}
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 024a2b2fd84..2f4e2ae6d39 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -4751,14 +4751,14 @@ planstate_tree_walker_impl(PlanState *planstate,
switch (nodeTag(plan))
{
case T_Append:
- if (planstate_walk_members(((AppendState *) planstate)->appendplans,
- ((AppendState *) planstate)->as_nplans,
+ if (planstate_walk_members(((AppendState *) planstate)->as.plans,
+ ((AppendState *) planstate)->as.nplans,
walker, context))
return true;
break;
case T_MergeAppend:
- if (planstate_walk_members(((MergeAppendState *) planstate)->mergeplans,
- ((MergeAppendState *) planstate)->ms_nplans,
+ if (planstate_walk_members(((MergeAppendState *) planstate)->ms.plans,
+ ((MergeAppendState *) planstate)->ms.nplans,
walker, context))
return true;
break;
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index fc0e5c62ea9..a7b7e3d45e0 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1262,11 +1262,11 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
* child plans, to make cross-checking the sort info easier.
*/
plan = makeNode(Append);
- plan->plan.targetlist = tlist;
- plan->plan.qual = NIL;
- plan->plan.lefttree = NULL;
- plan->plan.righttree = NULL;
- plan->apprelids = rel->relids;
+ plan->ap.plan.targetlist = tlist;
+ plan->ap.plan.qual = NIL;
+ plan->ap.plan.lefttree = NULL;
+ plan->ap.plan.righttree = NULL;
+ plan->ap.apprelids = rel->relids;
if (pathkeys != NIL)
{
@@ -1285,7 +1285,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
&nodeSortOperators,
&nodeCollations,
&nodeNullsFirst);
- tlist_was_changed = (orig_tlist_length != list_length(plan->plan.targetlist));
+ tlist_was_changed = (orig_tlist_length != list_length(plan->ap.plan.targetlist));
}
/* If appropriate, consider async append */
@@ -1395,7 +1395,7 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
/* Set below if we find quals that we can use to run-time prune */
- plan->part_prune_index = -1;
+ plan->ap.part_prune_index = -1;
/*
* If any quals exist, they may be useful to perform further partition
@@ -1420,16 +1420,16 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ plan->ap.part_prune_index = make_partition_pruneinfo(root, rel,
best_path->subpaths,
prunequal);
}
- plan->appendplans = subplans;
+ plan->ap.subplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- copy_generic_path_info(&plan->plan, (Path *) best_path);
+ copy_generic_path_info(&plan->ap.plan, (Path *) best_path);
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
@@ -1438,9 +1438,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
*/
if (tlist_was_changed && (flags & (CP_EXACT_TLIST | CP_SMALL_TLIST)))
{
- tlist = list_copy_head(plan->plan.targetlist, orig_tlist_length);
+ tlist = list_copy_head(plan->ap.plan.targetlist, orig_tlist_length);
return inject_projection_plan((Plan *) plan, tlist,
- plan->plan.parallel_safe);
+ plan->ap.plan.parallel_safe);
}
else
return (Plan *) plan;
@@ -1458,7 +1458,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
int flags)
{
MergeAppend *node = makeNode(MergeAppend);
- Plan *plan = &node->plan;
+ Plan *plan = &node->ap.plan;
List *tlist = build_path_tlist(root, &best_path->path);
int orig_tlist_length = list_length(tlist);
bool tlist_was_changed;
@@ -1479,7 +1479,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
plan->qual = NIL;
plan->lefttree = NULL;
plan->righttree = NULL;
- node->apprelids = rel->relids;
+ node->ap.apprelids = rel->relids;
consider_async = (enable_async_merge_append &&
!best_path->path.parallel_safe &&
@@ -1593,7 +1593,7 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
/* Set below if we find quals that we can use to run-time prune */
- node->part_prune_index = -1;
+ node->ap.part_prune_index = -1;
/*
* If any quals exist, they may be useful to perform further partition
@@ -1610,12 +1610,12 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
Assert(best_path->path.param_info == NULL);
if (prunequal != NIL)
- node->part_prune_index = make_partition_pruneinfo(root, rel,
+ node->ap.part_prune_index = make_partition_pruneinfo(root, rel,
best_path->subpaths,
prunequal);
}
- node->mergeplans = subplans;
+ node->ap.subplans = subplans;
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index cd7ea1e6b58..a595f34c87b 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1850,10 +1850,10 @@ set_append_references(PlannerInfo *root,
* check quals. If it's got exactly one child plan, then it's not doing
* anything useful at all, and we can strip it out.
*/
- Assert(aplan->plan.qual == NIL);
+ Assert(aplan->ap.plan.qual == NIL);
/* First, we gotta recurse on the children */
- foreach(l, aplan->appendplans)
+ foreach(l, aplan->ap.subplans)
{
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
@@ -1866,11 +1866,11 @@ set_append_references(PlannerInfo *root,
* plan may execute the non-parallel aware child multiple times. (If you
* change these rules, update create_append_path to match.)
*/
- if (list_length(aplan->appendplans) == 1)
+ if (list_length(aplan->ap.subplans) == 1)
{
- Plan *p = (Plan *) linitial(aplan->appendplans);
+ Plan *p = (Plan *) linitial(aplan->ap.subplans);
- if (p->parallel_aware == aplan->plan.parallel_aware)
+ if (p->parallel_aware == aplan->ap.plan.parallel_aware)
return clean_up_removed_plan_level((Plan *) aplan, p);
}
@@ -1881,19 +1881,19 @@ set_append_references(PlannerInfo *root,
*/
set_dummy_tlist_references((Plan *) aplan, rtoffset);
- aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
+ aplan->ap.apprelids = offset_relid_set(aplan->ap.apprelids, rtoffset);
/*
* Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
* Also update the RT indexes present in it to add the offset.
*/
- if (aplan->part_prune_index >= 0)
- aplan->part_prune_index =
- register_partpruneinfo(root, aplan->part_prune_index, rtoffset);
+ if (aplan->ap.part_prune_index >= 0)
+ aplan->ap.part_prune_index =
+ register_partpruneinfo(root, aplan->ap.part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
- Assert(aplan->plan.lefttree == NULL);
- Assert(aplan->plan.righttree == NULL);
+ Assert(aplan->ap.plan.lefttree == NULL);
+ Assert(aplan->ap.plan.righttree == NULL);
return (Plan *) aplan;
}
@@ -1917,10 +1917,10 @@ set_mergeappend_references(PlannerInfo *root,
* or check quals. If it's got exactly one child plan, then it's not
* doing anything useful at all, and we can strip it out.
*/
- Assert(mplan->plan.qual == NIL);
+ Assert(mplan->ap.plan.qual == NIL);
/* First, we gotta recurse on the children */
- foreach(l, mplan->mergeplans)
+ foreach(l, mplan->ap.subplans)
{
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
@@ -1934,11 +1934,11 @@ set_mergeappend_references(PlannerInfo *root,
* multiple times. (If you change these rules, update
* create_merge_append_path to match.)
*/
- if (list_length(mplan->mergeplans) == 1)
+ if (list_length(mplan->ap.subplans) == 1)
{
- Plan *p = (Plan *) linitial(mplan->mergeplans);
+ Plan *p = (Plan *) linitial(mplan->ap.subplans);
- if (p->parallel_aware == mplan->plan.parallel_aware)
+ if (p->parallel_aware == mplan->ap.plan.parallel_aware)
return clean_up_removed_plan_level((Plan *) mplan, p);
}
@@ -1949,19 +1949,19 @@ set_mergeappend_references(PlannerInfo *root,
*/
set_dummy_tlist_references((Plan *) mplan, rtoffset);
- mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
+ mplan->ap.apprelids = offset_relid_set(mplan->ap.apprelids, rtoffset);
/*
* Add PartitionPruneInfo, if any, to PlannerGlobal and update the index.
* Also update the RT indexes present in it to add the offset.
*/
- if (mplan->part_prune_index >= 0)
- mplan->part_prune_index =
- register_partpruneinfo(root, mplan->part_prune_index, rtoffset);
+ if (mplan->ap.part_prune_index >= 0)
+ mplan->ap.part_prune_index =
+ register_partpruneinfo(root, mplan->ap.part_prune_index, rtoffset);
/* We don't need to recurse to lefttree or righttree ... */
- Assert(mplan->plan.lefttree == NULL);
- Assert(mplan->plan.righttree == NULL);
+ Assert(mplan->ap.plan.lefttree == NULL);
+ Assert(mplan->ap.plan.righttree == NULL);
return (Plan *) mplan;
}
diff --git a/src/backend/optimizer/plan/subselect.c b/src/backend/optimizer/plan/subselect.c
index ff63d20f8d5..eb616c977bc 100644
--- a/src/backend/optimizer/plan/subselect.c
+++ b/src/backend/optimizer/plan/subselect.c
@@ -2759,7 +2759,7 @@ finalize_plan(PlannerInfo *root, Plan *plan,
case T_Append:
{
- foreach(l, ((Append *) plan)->appendplans)
+ foreach(l, ((Append *) plan)->ap.subplans)
{
context.paramids =
bms_add_members(context.paramids,
@@ -2774,7 +2774,7 @@ finalize_plan(PlannerInfo *root, Plan *plan,
case T_MergeAppend:
{
- foreach(l, ((MergeAppend *) plan)->mergeplans)
+ foreach(l, ((MergeAppend *) plan)->ap.subplans)
{
context.paramids =
bms_add_members(context.paramids,
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 9f85eb86da1..ce57f80e5e3 100644
--- a/src/backend/utils/adt/ruleutils.c
+++ b/src/backend/utils/adt/ruleutils.c
@@ -5163,9 +5163,9 @@ set_deparse_plan(deparse_namespace *dpns, Plan *plan)
* natural choice.
*/
if (IsA(plan, Append))
- dpns->outer_plan = linitial(((Append *) plan)->appendplans);
+ dpns->outer_plan = linitial(((Append *) plan)->ap.subplans);
else if (IsA(plan, MergeAppend))
- dpns->outer_plan = linitial(((MergeAppend *) plan)->mergeplans);
+ dpns->outer_plan = linitial(((MergeAppend *) plan)->ap.subplans);
else
dpns->outer_plan = outerPlan(plan);
@@ -7955,10 +7955,10 @@ resolve_special_varno(Node *node, deparse_context *context,
if (IsA(dpns->plan, Append))
context->appendparents = bms_union(context->appendparents,
- ((Append *) dpns->plan)->apprelids);
+ ((Append *) dpns->plan)->ap.apprelids);
else if (IsA(dpns->plan, MergeAppend))
context->appendparents = bms_union(context->appendparents,
- ((MergeAppend *) dpns->plan)->apprelids);
+ ((MergeAppend *) dpns->plan)->ap.apprelids);
push_child_plan(dpns, dpns->outer_plan, &save_dpns);
resolve_special_varno((Node *) tle->expr, context,
diff --git a/src/include/executor/execAppend.h b/src/include/executor/execAppend.h
new file mode 100644
index 00000000000..b5509373524
--- /dev/null
+++ b/src/include/executor/execAppend.h
@@ -0,0 +1,60 @@
+/*-------------------------------------------------------------------------
+ * execAppend.h
+ * Support functions for MergeAppend and Append nodes.
+ *
+ * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * IDENTIFICATION
+ * src/include/executor/execAppend.h
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef EXECAPPEND_H
+#define EXECAPPEND_H
+
+#include "nodes/execnodes.h"
+
+void ExecInitAppender(AppenderState * state,
+ Appender * node,
+ EState *estate,
+ int eflags,
+ int first_partial_plan,
+ int *first_valid_partial_plan);
+
+void ExecEndAppender(AppenderState * node);
+
+void ExecReScanAppender(AppenderState * node);
+
+void ExecAppenderAsyncBegin(AppenderState * node);
+
+void ExecAppenderAsyncEventWait(AppenderState * node,
+ int timeout,
+ uint32 wait_event_info);
+
+/* Common part of classify_matching_subplans() for Append and MergeAppend */
+static inline bool
+classify_matching_subplans_common(Bitmapset **valid_subplans,
+ Bitmapset *asyncplans,
+ Bitmapset **valid_asyncplans)
+{
+ Assert(*valid_asyncplans == NULL);
+
+ /* Checked by classify_matching_subplans() */
+ Assert(!bms_is_empty(*valid_subplans));
+
+ /* Nothing to do if there are no valid async subplans. */
+ if (!bms_overlap(*valid_subplans, asyncplans))
+ return false;
+
+ /* Get valid async subplans. */
+ *valid_asyncplans = bms_intersect(asyncplans,
+ *valid_subplans);
+
+ /* Adjust the valid subplans to contain sync subplans only. */
+ *valid_subplans = bms_del_members(*valid_subplans,
+ *valid_asyncplans);
+ return true;
+}
+
+#endif /* EXECAPPEND_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5887cbf4f16..501f2c3acf4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1472,6 +1472,27 @@ typedef struct ModifyTableState
List *mt_mergeJoinConditions;
} ModifyTableState;
+typedef struct AppenderState
+{
+ PlanState ps; /* its first field is NodeTag */
+ PlanState **plans; /* array of PlanStates for my inputs */
+ int nplans;
+
+ /* Asynchronous execution state */
+ Bitmapset *asyncplans; /* asynchronous plans indexes */
+ int nasyncplans; /* # of asynchronous plans */
+ AsyncRequest **asyncrequests; /* array of AsyncRequests */
+ TupleTableSlot **asyncresults; /* unreturned results of async plans */
+ Bitmapset *needrequest; /* asynchronous plans needing a new request */
+ struct WaitEventSet *eventset; /* WaitEventSet for file descriptor waits */
+
+ /* Partition pruning state */
+ struct PartitionPruneState *prune_state;
+ bool valid_subplans_identified;
+ Bitmapset *valid_subplans;
+ Bitmapset *valid_asyncplans; /* valid asynchronous plans indexes */
+} AppenderState;
+
/* ----------------
* AppendState information
*
@@ -1493,31 +1514,20 @@ struct PartitionPruneState;
struct AppendState
{
- PlanState ps; /* its first field is NodeTag */
- PlanState **appendplans; /* array of PlanStates for my inputs */
- int as_nplans;
+ AppenderState as;
+
int as_whichplan;
bool as_begun; /* false means need to initialize */
- Bitmapset *as_asyncplans; /* asynchronous plans indexes */
- int as_nasyncplans; /* # of asynchronous plans */
- AsyncRequest **as_asyncrequests; /* array of AsyncRequests */
- TupleTableSlot **as_asyncresults; /* unreturned results of async plans */
- int as_nasyncresults; /* # of valid entries in as_asyncresults */
- bool as_syncdone; /* true if all synchronous plans done in
- * asynchronous mode, else false */
+ int as_nasyncresults; /* # of valid entries in asyncresults */
+ bool as_syncdone; /* all sync plans done in async mode? */
int as_nasyncremain; /* # of remaining asynchronous plans */
- Bitmapset *as_needrequest; /* asynchronous plans needing a new request */
- struct WaitEventSet *as_eventset; /* WaitEventSet used to configure file
- * descriptor wait events */
- int as_first_partial_plan; /* Index of 'appendplans' containing
- * the first partial plan */
- ParallelAppendState *as_pstate; /* parallel coordination info */
- Size pstate_len; /* size of parallel coordination info */
- struct PartitionPruneState *as_prune_state;
- bool as_valid_subplans_identified; /* is as_valid_subplans valid? */
- Bitmapset *as_valid_subplans;
- Bitmapset *as_valid_asyncplans; /* valid asynchronous plans indexes */
- bool (*choose_next_subplan) (AppendState *);
+ int as_first_partial_plan;
+
+ /* Parallel append specific */
+ ParallelAppendState *as_pstate;
+ Size pstate_len;
+
+ bool (*choose_next_subplan) (struct AppendState *);
};
/* ----------------
@@ -1537,27 +1547,17 @@ struct AppendState
*/
typedef struct MergeAppendState
{
- PlanState ps; /* its first field is NodeTag */
- PlanState **mergeplans; /* array of PlanStates for my inputs */
- int ms_nplans;
+ AppenderState ms;
+
int ms_nkeys;
SortSupport ms_sortkeys; /* array of length ms_nkeys */
TupleTableSlot **ms_slots; /* array of length ms_nplans */
struct binaryheap *ms_heap; /* binary heap of slot indices */
bool ms_initialized; /* are subplans started? */
- Bitmapset *ms_asyncplans; /* asynchronous plans indexes */
- int ms_nasyncplans; /* # of asynchronous plans */
- AsyncRequest **ms_asyncrequests; /* array of AsyncRequests */
- TupleTableSlot **ms_asyncresults; /* unreturned results of async plans */
+
+ /* Merge-specific async tracking */
Bitmapset *ms_has_asyncresults; /* plans which have async results */
Bitmapset *ms_asyncremain; /* remaining asynchronous plans */
- Bitmapset *ms_needrequest; /* asynchronous plans needing a new request */
- struct WaitEventSet *ms_eventset; /* WaitEventSet used to configure file
- * descriptor wait events */
- struct PartitionPruneState *ms_prune_state;
- bool ms_valid_subplans_identified; /* is ms_valid_subplans valid? */
- Bitmapset *ms_valid_subplans;
- Bitmapset *ms_valid_asyncplans; /* valid asynchronous plans indexes */
} MergeAppendState;
/* Getters for AppendState and MergeAppendState */
@@ -1567,9 +1567,9 @@ GetAppendEventSet(PlanState *ps)
Assert(IsA(ps, AppendState) || IsA(ps, MergeAppendState));
if (IsA(ps, AppendState))
- return ((AppendState *) ps)->as_eventset;
+ return ((AppendState *) ps)->as.eventset;
else
- return ((MergeAppendState *) ps)->ms_eventset;
+ return ((MergeAppendState *) ps)->ms.eventset;
}
static inline Bitmapset *
@@ -1578,34 +1578,9 @@ GetNeedRequest(PlanState *ps)
Assert(IsA(ps, AppendState) || IsA(ps, MergeAppendState));
if (IsA(ps, AppendState))
- return ((AppendState *) ps)->as_needrequest;
+ return ((AppendState *) ps)->as.needrequest;
else
- return ((MergeAppendState *) ps)->ms_needrequest;
-}
-
-/* Common part of classify_matching_subplans() for Append and MergeAppend */
-static inline bool
-classify_matching_subplans_common(Bitmapset **valid_subplans,
- Bitmapset *asyncplans,
- Bitmapset **valid_asyncplans)
-{
- Assert(*valid_asyncplans == NULL);
-
- /* Checked by classify_matching_subplans() */
- Assert(!bms_is_empty(*valid_subplans));
-
- /* Nothing to do if there are no valid async subplans. */
- if (!bms_overlap(*valid_subplans, asyncplans))
- return false;
-
- /* Get valid async subplans. */
- *valid_asyncplans = bms_intersect(asyncplans,
- *valid_subplans);
-
- /* Adjust the valid subplans to contain sync subplans only. */
- *valid_subplans = bms_del_members(*valid_subplans,
- *valid_asyncplans);
- return true;
+ return ((MergeAppendState *) ps)->ms.needrequest;
}
/* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..30c20e80b40 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -380,6 +380,20 @@ typedef struct ModifyTable
struct PartitionPruneInfo; /* forward reference to struct below */
+typedef struct Appender
+{
+ Plan plan; /* its first field is NodeTag */
+ Bitmapset *apprelids; /* RTIs of appendrel(s) formed by this node */
+ List *subplans; /* List of Plans (formerly
+ * appendplans/mergeplans) */
+
+ /*
+ * Index into PlannedStmt.partPruneInfos and parallel lists in EState. Set
+ * to -1 if no run-time pruning is used.
+ */
+ int part_prune_index;
+} Appender;
+
/* ----------------
* Append node -
* Generate the concatenation of the results of sub-plans.
@@ -387,25 +401,16 @@ struct PartitionPruneInfo; /* forward reference to struct below */
*/
typedef struct Append
{
- Plan plan;
- /* RTIs of appendrel(s) formed by this node */
- Bitmapset *apprelids;
- List *appendplans;
+ Appender ap;
+
/* # of asynchronous plans */
int nasyncplans;
/*
- * All 'appendplans' preceding this index are non-partial plans. All
- * 'appendplans' from this index onwards are partial plans.
+ * All 'subplans' preceding this index are non-partial plans. All
+ * 'subplans' from this index onwards are partial plans.
*/
int first_partial_plan;
-
- /*
- * Index into PlannedStmt.partPruneInfos and parallel lists in EState:
- * es_part_prune_states and es_part_prune_results. Set to -1 if no
- * run-time pruning is used.
- */
- int part_prune_index;
} Append;
/* ----------------
@@ -415,12 +420,7 @@ typedef struct Append
*/
typedef struct MergeAppend
{
- Plan plan;
-
- /* RTIs of appendrel(s) formed by this node */
- Bitmapset *apprelids;
-
- List *mergeplans;
+ Appender ap;
/* these fields are just like the sort-key info in struct Sort: */
@@ -438,13 +438,6 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
-
- /*
- * Index into PlannedStmt.partPruneInfos and parallel lists in EState:
- * es_part_prune_states and es_part_prune_results. Set to -1 if no
- * run-time pruning is used.
- */
- int part_prune_index;
} MergeAppend;
/* ----------------
--
2.51.2
view thread (33+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected]
Subject: Re: Asynchronous MergeAppend
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox