public inbox for [email protected]
help / color / mirror / Atom feedgeneric plans and "initial" pruning
82+ messages / 12 participants
[nested] [flat]
* generic plans and "initial" pruning
@ 2021-12-25 03:36 Amit Langote <[email protected]>
0 siblings, 2 replies; 82+ messages in thread
From: Amit Langote @ 2021-12-25 03:36 UTC (permalink / raw)
To: pgsql-hackers
Executing generic plans involving partitions is known to become slower
as partition count grows due to a number of bottlenecks, with
AcquireExecutorLocks() showing at the top in profiles.
Previous attempt at solving that problem was by David Rowley [1],
where he proposed delaying locking of *all* partitions appearing under
an Append/MergeAppend until "initial" pruning is done during the
executor initialization phase. A problem with that approach that he
has described in [2] is that leaving partitions unlocked can lead to
race conditions where the Plan node belonging to a partition can be
invalidated when a concurrent session successfully alters the
partition between AcquireExecutorLocks() saying the plan is okay to
execute and then actually executing it.
However, using an idea that Robert suggested to me off-list a little
while back, it seems possible to determine the set of partitions that
we can safely skip locking. The idea is to look at the "initial" or
"pre-execution" pruning instructions contained in a given Append or
MergeAppend node when AcquireExecutorLocks() is collecting the
relations to lock and consider relations from only those sub-nodes
that survive performing those instructions. I've attempted
implementing that idea in the attached patch.
Note that "initial" pruning steps are now performed twice when
executing generic plans: once in AcquireExecutorLocks() to find
partitions to be locked, and a 2nd time in ExecInit[Merge]Append() to
determine the set of partition sub-nodes to be initialized for
execution, though I wasn't able to come up with a good idea to avoid
this duplication.
Using the following benchmark setup:
pgbench testdb -i --partitions=$nparts > /dev/null 2>&1
pgbench -n testdb -S -T 30 -Mprepared
And plan_cache_mode = force_generic_plan,
I get following numbers:
HEAD:
32 tps = 20561.776403 (without initial connection time)
64 tps = 12553.131423 (without initial connection time)
128 tps = 13330.365696 (without initial connection time)
256 tps = 8605.723120 (without initial connection time)
512 tps = 4435.951139 (without initial connection time)
1024 tps = 2346.902973 (without initial connection time)
2048 tps = 1334.680971 (without initial connection time)
Patched:
32 tps = 27554.156077 (without initial connection time)
64 tps = 27531.161310 (without initial connection time)
128 tps = 27138.305677 (without initial connection time)
256 tps = 25825.467724 (without initial connection time)
512 tps = 19864.386305 (without initial connection time)
1024 tps = 18742.668944 (without initial connection time)
2048 tps = 16312.412704 (without initial connection time)
--
Amit Langote
EDB: http://www.enterprisedb.com
[1] https://www.postgresql.org/message-id/[email protected]...
[2] https://www.postgresql.org/message-id/CAKJS1f99JNe%2Bsw5E3qWmS%2BHeLMFaAhehKO67J1Ym3pXv0XBsxw%40mail...
Attachments:
[application/octet-stream] v1-0001-Teach-AcquireExecutorLocks-to-acquire-fewer-locks.patch (62.1K, 2-v1-0001-Teach-AcquireExecutorLocks-to-acquire-fewer-locks.patch)
download | inline diff:
From ed4de69e7ae180eca380ae581152b6650175661f Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v1] Teach AcquireExecutorLocks() to acquire fewer locks in
some cases
Currently, AcquireExecutorLocks() loops over the range table of a
given PlannedStmt and locks all relations found therein, even those
that won't actually be scanned during execution due to being
eliminated by "initial" pruning that is applied during the
initialization of their owning Append or MergeAppend node. This makes
AcquireExecutorLocks() itself do the "initial" pruning on nodes that
support it and lock only those relations that are contained in the
subnodes that survive the pruning.
To that end, AcquireExecutorLocks() now loops over a bitmapset of
RT indexes, those of the RTEs of "lockable" relations, instead of
the whole range table to find such entries. When pruning is possible,
the bitmapset is constructed by walking the plan tree to locate
nodes that allow "initial" (or "pre-execution") pruning and
disregarding relations from subnodes that don't survive the pruning
instructions.
PlannedStmt gets a bitmapset field to store the RT indexes of
lockable relations that is populated when contructing the flat range
table in setrefs.c. It is used as is in the absence of any prunable
nodes.
PlannedStmt also gets a new field that indicates whether any of the
nodes of the plan tree contain "initial" (or "pre-execution") pruning
steps, which saves the trouble of walking the plan tree only to find
whether that's the case.
ExecFindInitialMatchingSubPlans() is refactored to allow being
called outside a full-fledged executor context.
---
src/backend/executor/execParallel.c | 2 +
src/backend/executor/execPartition.c | 534 ++++++++++++++++++-------
src/backend/executor/nodeAppend.c | 39 +-
src/backend/executor/nodeMergeAppend.c | 39 +-
src/backend/nodes/copyfuncs.c | 4 +
src/backend/nodes/nodeFuncs.c | 121 +++++-
src/backend/nodes/outfuncs.c | 5 +
src/backend/nodes/readfuncs.c | 4 +
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 10 +
src/backend/partitioning/partprune.c | 57 ++-
src/backend/utils/cache/plancache.c | 217 +++++++++-
src/include/executor/execPartition.h | 13 +-
src/include/nodes/nodeFuncs.h | 3 +
src/include/nodes/pathnodes.h | 6 +
src/include/nodes/plannodes.h | 15 +
src/include/partitioning/partprune.h | 3 +
17 files changed, 866 insertions(+), 208 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f8a4a40e7b..d14e60724b 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -182,8 +182,10 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->usesPreExecPruning = false;
pstmt->planTree = plan;
pstmt->rtable = estate->es_range_table;
+ pstmt->relationRTIs = NULL;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 5c723bc54e..8c63272398 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -186,7 +187,8 @@ static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate);
+ PlanState *planstate,
+ ExprContext *econtext);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1511,8 +1513,7 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
/*
* ExecCreatePartitionPruneState
- * Build the data structure required for calling
- * ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
+ * Build the data structure for run-time pruning
*
* 'planstate' is the parent plan node's execution state.
*
@@ -1526,10 +1527,20 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* as children. The data stored in each PartitionedRelPruningData can be
* re-used each time we re-evaluate which partitions match the pruning steps
* provided in each PartitionedRelPruneInfo.
+ *
+ * This does not consider initial_pruning_steps because they must already have
+ * been performed by the caller and the subplans remaining after doing so are
+ * given as 'initially_valid_subplans'. The translation data to be put into
+ * PartitionPruneState that allows conversion of partition indexes into subplan
+ * indexes are updated here to account for the unneeded subplans having been
+ * removed by initial pruning. 'nsubplans' gives the number of subplans that
+ * were present before initial pruning.
*/
PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo)
+ PartitionPruneInfo *partitionpruneinfo,
+ Bitmapset *initially_valid_subplans,
+ int nsubplans)
{
EState *estate = planstate->state;
PartitionPruneState *prunestate;
@@ -1537,6 +1548,15 @@ ExecCreatePartitionPruneState(PlanState *planstate,
ListCell *lc;
int i;
+ /*
+ * Only create a PartitionPruneState if pruning needs to be performed
+ * during the execution of the owning plan. Note that this means the
+ * initial pruning steps which are used to determine the set of subplans
+ * that are valid for actual execution are performed without creating a
+ * PartitionPruneState; see ExecFindInitialMatchingSubPlans().
+ */
+ Assert(partitionpruneinfo->contains_exec_steps);
+
/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
estate->es_partition_directory =
@@ -1555,7 +1575,6 @@ ExecCreatePartitionPruneState(PlanState *planstate,
prunestate->execparamids = NULL;
/* other_subplans can change at runtime, so we need our own copy */
prunestate->other_subplans = bms_copy(partitionpruneinfo->other_subplans);
- prunestate->do_initial_prune = false; /* may be set below */
prunestate->do_exec_prune = false; /* may be set below */
prunestate->num_partprunedata = n_part_hierarchies;
@@ -1702,23 +1721,17 @@ ExecCreatePartitionPruneState(PlanState *planstate,
pprune->present_parts = bms_copy(pinfo->present_parts);
/*
- * Initialize pruning contexts as needed.
+ * Initialize pruning contexts as needed, ignoring any
+ * initial_pruning_steps because they must already have been
+ * performed.
*/
- pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
- {
- ExecInitPruningContext(&pprune->initial_context,
- pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
- /* Record whether initial pruning is needed at any level */
- prunestate->do_initial_prune = true;
- }
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
if (pinfo->exec_pruning_steps)
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ planstate->ps_ExprContext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
}
@@ -1735,18 +1748,136 @@ ExecCreatePartitionPruneState(PlanState *planstate,
i++;
}
+ /*
+ * If exec-time pruning is required and subplans appear to have been
+ * pruned by initial pruning steps, then we must re-sequence the subplan
+ * indexes so that ExecFindMatchingSubPlans() properly returns the indexes
+ * of the subplans that have remained after initial pruning, that is,
+ * initially_valid_subplans.
+ *
+ * We can safely skip this when !do_exec_prune, even though that leaves
+ * invalid data in pruneinfo, because that data won't be consulted again
+ * (cf initial Assert in ExecFindMatchingSubPlans).
+ */
+ if (prunestate->do_exec_prune &&
+ bms_num_members(initially_valid_subplans) < nsubplans)
+ {
+ int *new_subplan_indexes;
+ Bitmapset *new_other_subplans;
+ int i;
+ int newidx;
+
+ /*
+ * First we must build a temporary array which maps old subplan
+ * indexes to new ones. For convenience of initialization, we use
+ * 1-based indexes in this array and leave pruned items as 0.
+ */
+ new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
+ newidx = 1;
+ i = -1;
+ while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
+ {
+ Assert(i < nsubplans);
+ new_subplan_indexes[i] = newidx++;
+ }
+
+ /*
+ * Now we can update each PartitionedRelPruneInfo's subplan_map with
+ * new subplan indexes. We must also recompute its present_parts
+ * bitmap.
+ */
+ for (i = 0; i < prunestate->num_partprunedata; i++)
+ {
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
+
+ /*
+ * Within each hierarchy, we perform this loop in back-to-front
+ * order so that we determine present_parts for the lowest-level
+ * partitioned tables first. This way we can tell whether a
+ * sub-partitioned table's partitions were entirely pruned so we
+ * can exclude it from the current level's present_parts.
+ */
+ for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
+ {
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ int nparts = pprune->nparts;
+ int k;
+
+ /* We just rebuild present_parts from scratch */
+ bms_free(pprune->present_parts);
+ pprune->present_parts = NULL;
+
+ for (k = 0; k < nparts; k++)
+ {
+ int oldidx = pprune->subplan_map[k];
+ int subidx;
+
+ /*
+ * If this partition existed as a subplan then change the
+ * old subplan index to the new subplan index. The new
+ * index may become -1 if the partition was pruned above,
+ * or it may just come earlier in the subplan list due to
+ * some subplans being removed earlier in the list. If
+ * it's a subpartition, add it to present_parts unless
+ * it's entirely pruned.
+ */
+ if (oldidx >= 0)
+ {
+ Assert(oldidx < nsubplans);
+ pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
+
+ if (new_subplan_indexes[oldidx] > 0)
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ else if ((subidx = pprune->subpart_map[k]) >= 0)
+ {
+ PartitionedRelPruningData *subprune;
+
+ subprune = &prunedata->partrelprunedata[subidx];
+
+ if (!bms_is_empty(subprune->present_parts))
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ }
+ }
+ }
+
+ /*
+ * We must also recompute the other_subplans set, since indexes in it
+ * may change.
+ */
+ new_other_subplans = NULL;
+ i = -1;
+ while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
+ new_other_subplans = bms_add_member(new_other_subplans,
+ new_subplan_indexes[i] - 1);
+
+ bms_free(prunestate->other_subplans);
+ prunestate->other_subplans = new_other_subplans;
+
+ pfree(new_subplan_indexes);
+ }
+
return prunestate;
}
/*
* Initialize a PartitionPruneContext for the given list of pruning steps.
+ *
+ * At least one of 'planstate' or 'econtext' must be passed to be able to
+ * successfully evaluate any non-Const expressions contained in the
+ * steps.
*/
static void
ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate)
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1767,6 +1898,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1795,8 +1927,13 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
@@ -1809,171 +1946,283 @@ ExecInitPruningContext(PartitionPruneContext *context,
* pruning, disregarding any pruning constraints involving PARAM_EXEC
* Params.
*
- * If additional pruning passes will be required (because of PARAM_EXEC
- * Params), we must also update the translation data that allows conversion
- * of partition indexes into subplan indexes to account for the unneeded
- * subplans having been removed.
+ * Must only be called once per 'pruneinfo', and only if initial pruning is
+ * required.
*
- * Must only be called once per 'prunestate', and only if initial pruning
- * is required.
+ * 'param' contains information about any EXTERN parameters that might be
+ * present in the initial pruning steps.
*
- * 'nsubplans' must be passed as the total number of unpruned subplans.
+ * The RT indexes of unpruned parents are returned in *parentrelids if asked
+ * for by the caller.
*/
Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+ExecFindInitialMatchingSubPlans(PartitionPruneInfo *pruneinfo,
+ EState *estate, List *rtable,
+ ParamListInfo params,
+ Bitmapset **parentrelids)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
+ MemoryContext tmpcontext;
int i;
+ ListCell *lc;
+ int n_part_hierarchies;
+ bool free_estate = false;
+ ExprContext *econtext;
+ PartitionPruningData **partprunedata;
+ PartitionDirectory pdir;
- /* Caller error if we get here without do_initial_prune */
- Assert(prunestate->do_initial_prune);
-
- /*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
- */
- oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
-
- /*
- * For each hierarchy, do the pruning tests, and add nondeletable
- * subplans' indexes to "result".
- */
- for (i = 0; i < prunestate->num_partprunedata; i++)
- {
- PartitionPruningData *prunedata;
- PartitionedRelPruningData *pprune;
+ /* Caller error if we get here without contains_init_steps */
+ Assert(pruneinfo->contains_init_steps);
- prunedata = prunestate->partprunedata[i];
- pprune = &prunedata->partrelprunedata[0];
- /* Perform pruning without using PARAM_EXEC Params */
- find_matching_subplans_recurse(prunedata, pprune, true, &result);
+ if (parentrelids)
+ *parentrelids = NULL;
- /* Expression eval may have used space in node's ps_ExprContext too */
- if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ /* Set up EState if not in the executor proper. */
+ if (estate == NULL)
+ {
+ estate = CreateExecutorState();
+ estate->es_param_list_info = params;
+ free_estate = true;
}
- /* Add in any subplans that partition pruning didn't account for */
- result = bms_add_members(result, prunestate->other_subplans);
-
- MemoryContextSwitchTo(oldcontext);
+ /* An ExprContext to evaluate expressions. */
+ econtext = CreateExprContext(estate);
- /* Copy result out of the temp context before we reset it */
- result = bms_copy(result);
+ /* PartitionDirectory, creating one if not there already. */
+ pdir = estate->es_partition_directory;
+ if (pdir == NULL)
+ {
+ /* Omits detached partitions, just like in the executor proper. */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+ estate->es_partition_directory = pdir;
+ }
- MemoryContextReset(prunestate->prune_context);
+ /* A temporary context to allocate stuff needded to run pruning steps. */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
/*
- * If exec-time pruning is required and we pruned subplans above, then we
- * must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
- * properly returns the indexes from the subplans which will remain after
- * execution of this function.
+ * Stuff that follows matches exactly what ExecCreatePartitionPruneState()
+ * does, except we don't need a PartitionPruneState here, so don't call
+ * that function.
*
- * We can safely skip this when !do_exec_prune, even though that leaves
- * invalid data in prunestate, because that data won't be consulted again
- * (cf initial Assert in ExecFindMatchingSubPlans).
+ * XXX some refactoring might be good.
*/
- if (prunestate->do_exec_prune && bms_num_members(result) < nsubplans)
+
+ /* PartitionPruningData for each partition hierarachy. */
+ n_part_hierarchies = list_length(pruneinfo->prune_infos);
+ Assert(n_part_hierarchies > 0);
+ partprunedata = (PartitionPruningData **)
+ palloc(sizeof(PartitionPruningData *) * n_part_hierarchies);
+ i = 0;
+ foreach(lc, pruneinfo->prune_infos)
{
- int *new_subplan_indexes;
- Bitmapset *new_other_subplans;
- int i;
- int newidx;
+ PartitionPruningData *prunedata;
+ List *partrelpruneinfos = lfirst_node(List, lc);
+ int npartrelpruneinfos = list_length(partrelpruneinfos);
+ ListCell *lc2;
+ int j;
- /*
- * First we must build a temporary array which maps old subplan
- * indexes to new ones. For convenience of initialization, we use
- * 1-based indexes in this array and leave pruned items as 0.
- */
- new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
- newidx = 1;
- i = -1;
- while ((i = bms_next_member(result, i)) >= 0)
- {
- Assert(i < nsubplans);
- new_subplan_indexes[i] = newidx++;
- }
+ /* PartitionedRelPruningData per parent in the hierarchy. */
+ prunedata = (PartitionPruningData *)
+ palloc(offsetof(PartitionPruningData, partrelprunedata) +
+ npartrelpruneinfos * sizeof(PartitionedRelPruningData));
+ partprunedata[i] = prunedata;
+ prunedata->num_partrelprunedata = npartrelpruneinfos;
- /*
- * Now we can update each PartitionedRelPruneInfo's subplan_map with
- * new subplan indexes. We must also recompute its present_parts
- * bitmap.
- */
- for (i = 0; i < prunestate->num_partprunedata; i++)
+ j = 0;
+ foreach(lc2, partrelpruneinfos)
{
- PartitionPruningData *prunedata = prunestate->partprunedata[i];
- int j;
+ PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ RangeTblEntry *partrte = rt_fetch(pinfo->rtindex, rtable);
+ Relation partrel;
+ PartitionDesc partdesc;
+ PartitionKey partkey;
/*
- * Within each hierarchy, we perform this loop in back-to-front
- * order so that we determine present_parts for the lowest-level
- * partitioned tables first. This way we can tell whether a
- * sub-partitioned table's partitions were entirely pruned so we
- * can exclude it from the current level's present_parts.
+ * We can rely on the copies of the partitioned table's partition
+ * key and partition descriptor appearing in its relcache entry,
+ * because that entry will be held open and locked while the
+ * PartitionedRelPruningData is in use.
*/
- for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
+ partrel = table_open(partrte->relid, partrte->rellockmode);
+ partkey = RelationGetPartitionKey(partrel);
+ partdesc = PartitionDirectoryLookup(pdir, partrel);
+
+ /*
+ * Initialize the subplan_map and subpart_map.
+ *
+ * Because we request detached partitions to be included, and
+ * detaching waits for old transactions, it is safe to assume that
+ * no partitions have disappeared since this query was planned.
+ *
+ * However, new partitions may have been added.
+ */
+ Assert(partdesc->nparts >= pinfo->nparts);
+ pprune->nparts = partdesc->nparts;
+ pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ if (partdesc->nparts == pinfo->nparts)
{
- PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
- int nparts = pprune->nparts;
- int k;
+ /*
+ * There are no new partitions, so this is simple. We can
+ * simply point to the subpart_map from the plan, but we must
+ * copy the subplan_map since we may change it later.
+ */
+ pprune->subpart_map = pinfo->subpart_map;
+ memcpy(pprune->subplan_map, pinfo->subplan_map,
+ sizeof(int) * pinfo->nparts);
- /* We just rebuild present_parts from scratch */
- bms_free(pprune->present_parts);
- pprune->present_parts = NULL;
+ /*
+ * Double-check that the list of unpruned relations has not
+ * changed. (Pruned partitions are not in relid_map[].)
+ */
+#ifdef USE_ASSERT_CHECKING
+ for (int k = 0; k < pinfo->nparts; k++)
+ {
+ Assert(partdesc->oids[k] == pinfo->relid_map[k] ||
+ pinfo->subplan_map[k] == -1);
+ }
+#endif
+ }
+ else
+ {
+ int pd_idx = 0;
+ int pp_idx;
- for (k = 0; k < nparts; k++)
+ /*
+ * Some new partitions have appeared since plan time, and
+ * those are reflected in our PartitionDesc but were not
+ * present in the one used to construct subplan_map and
+ * subpart_map. So we must construct new and longer arrays
+ * where the partitions that were originally present map to
+ * the same sub-structures, and any added partitions map to
+ * -1, as if the new partitions had been pruned.
+ *
+ * Note: pinfo->relid_map[] may contain InvalidOid entries for
+ * partitions pruned by the planner. We cannot tell exactly
+ * which of the partdesc entries these correspond to, but we
+ * don't have to; just skip over them. The non-pruned
+ * relid_map entries, however, had better be a subset of the
+ * partdesc entries and in the same order.
+ */
+ pprune->subpart_map = palloc(sizeof(int) * partdesc->nparts);
+ for (pp_idx = 0; pp_idx < partdesc->nparts; pp_idx++)
{
- int oldidx = pprune->subplan_map[k];
- int subidx;
+ /* Skip any InvalidOid relid_map entries */
+ while (pd_idx < pinfo->nparts &&
+ !OidIsValid(pinfo->relid_map[pd_idx]))
+ pd_idx++;
- /*
- * If this partition existed as a subplan then change the
- * old subplan index to the new subplan index. The new
- * index may become -1 if the partition was pruned above,
- * or it may just come earlier in the subplan list due to
- * some subplans being removed earlier in the list. If
- * it's a subpartition, add it to present_parts unless
- * it's entirely pruned.
- */
- if (oldidx >= 0)
+ if (pd_idx < pinfo->nparts &&
+ pinfo->relid_map[pd_idx] == partdesc->oids[pp_idx])
{
- Assert(oldidx < nsubplans);
- pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
-
- if (new_subplan_indexes[oldidx] > 0)
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
+ /* match... */
+ pprune->subplan_map[pp_idx] =
+ pinfo->subplan_map[pd_idx];
+ pprune->subpart_map[pp_idx] =
+ pinfo->subpart_map[pd_idx];
+ pd_idx++;
}
- else if ((subidx = pprune->subpart_map[k]) >= 0)
+ else
{
- PartitionedRelPruningData *subprune;
-
- subprune = &prunedata->partrelprunedata[subidx];
-
- if (!bms_is_empty(subprune->present_parts))
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
+ /* this partdesc entry is not in the plan */
+ pprune->subplan_map[pp_idx] = -1;
+ pprune->subpart_map[pp_idx] = -1;
}
}
+
+ /*
+ * It might seem that we need to skip any trailing InvalidOid
+ * entries in pinfo->relid_map before checking that we scanned
+ * all of the relid_map. But we will have skipped them above,
+ * because they must correspond to some partdesc->oids
+ * entries; we just couldn't tell which.
+ */
+ if (pd_idx != pinfo->nparts)
+ elog(ERROR, "could not match partition child tables to plan elements");
}
+
+ /* present_parts is also subject to later modification */
+ pprune->present_parts = bms_copy(pinfo->present_parts);
+ pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
+ if (pprune->initial_pruning_steps)
+ ExecInitPruningContext(&pprune->initial_context,
+ pprune->initial_pruning_steps,
+ partdesc, partkey, NULL, econtext);
+
+ table_close(partrel, NoLock);
+ j++;
}
+ i++;
+ }
+
+ /*
+ * For each hierarchy, do the pruning tests, and add nondeletable
+ * subplans' indexes to result.
+ */
+ for (i = 0; i < n_part_hierarchies; i++)
+ {
+ PartitionPruningData *prunedata = partprunedata[i];
+ PartitionedRelPruningData *pprune;
/*
- * We must also recompute the other_subplans set, since indexes in it
- * may change.
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
*/
- new_other_subplans = NULL;
- i = -1;
- while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
- new_other_subplans = bms_add_member(new_other_subplans,
- new_subplan_indexes[i] - 1);
+ pprune = &prunedata->partrelprunedata[0];
+ find_matching_subplans_recurse(prunedata, pprune, true, &result);
- bms_free(prunestate->other_subplans);
- prunestate->other_subplans = new_other_subplans;
+ /*
+ * Collect the RT indexes of surviving parents if the callers asked
+ * to see them.
+ */
+ if (parentrelids)
+ {
+ int j;
+ List *partrelpruneinfos = list_nth_node(List,
+ pruneinfo->prune_infos,
+ i);
- pfree(new_subplan_indexes);
+ for (j = 0; j < prunedata->num_partrelprunedata; j++)
+ {
+ PartitionedRelPruneInfo *pinfo = list_nth_node(PartitionedRelPruneInfo,
+ partrelpruneinfos, j);
+
+ pprune = &prunedata->partrelprunedata[j];
+ if (!bms_is_empty(pprune->present_parts))
+ *parentrelids = bms_add_member(*parentrelids, pinfo->rtindex);
+ }
+ }
+
+ /* Release space used up in our ExprContext. */
+ ResetExprContext(econtext);
+ }
+
+ /* Add in any subplans that partition pruning didn't account for. */
+ result = bms_add_members(result, pruneinfo->other_subplans);
+
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Copy result out of the temp context before we reset it */
+ result = bms_copy(result);
+ if (parentrelids)
+ *parentrelids = bms_copy(*parentrelids);
+
+ /* Safe to drop the temporary context */
+ MemoryContextDelete(tmpcontext);
+
+ /* Free the ExprState, and EState if needed. */
+ FreeExprContext(econtext, true);
+ if (free_estate)
+ {
+ FreeExecutorState(estate);
+ estate = NULL;
}
return result;
@@ -2018,6 +2267,11 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
find_matching_subplans_recurse(prunedata, pprune, false, &result);
/* Expression eval may have used space in node's ps_ExprContext too */
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 6a2daa6e76..7f813476ab 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -136,24 +136,15 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
- PartitionPruneState *prunestate;
-
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &appendstate->ps);
-
- /* Create the working data structure for pruning. */
- prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
- appendstate->as_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
+ if (node->part_prune_info->contains_init_steps)
{
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
-
+ validsubplans =
+ ExecFindInitialMatchingSubPlans(node->part_prune_info,
+ estate, estate->es_range_table,
+ estate->es_param_list_info,
+ NULL);
nplans = bms_num_members(validsubplans);
+ Assert(nplans >= 0);
}
else
{
@@ -163,12 +154,26 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
validsubplans = bms_add_range(NULL, 0, nplans - 1);
}
+ /* Create the working data structure for run-time pruning. */
+ if (node->part_prune_info->contains_exec_steps)
+ {
+ PartitionPruneState *prunestate;
+
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, &appendstate->ps);
+ prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
+ node->part_prune_info,
+ validsubplans,
+ list_length(node->appendplans));
+
+ appendstate->as_prune_state = prunestate;
+ }
/*
* When no run-time pruning is required and there's at least one
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ else
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 617bffb206..51c5c3433d 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -84,23 +84,15 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
/* If run-time partition pruning is enabled, then set that up now */
if (node->part_prune_info != NULL)
{
- PartitionPruneState *prunestate;
-
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &mergestate->ps);
-
- prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
- mergestate->ms_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
+ if (node->part_prune_info->contains_init_steps)
{
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
-
+ validsubplans =
+ ExecFindInitialMatchingSubPlans(node->part_prune_info,
+ estate, estate->es_range_table,
+ estate->es_param_list_info,
+ NULL);
nplans = bms_num_members(validsubplans);
+ Assert(nplans >= 0);
}
else
{
@@ -110,13 +102,28 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
validsubplans = bms_add_range(NULL, 0, nplans - 1);
}
+ /* Create the working data structure for run-time pruning. */
+ if (node->part_prune_info->contains_exec_steps)
+ {
+ PartitionPruneState *prunestate;
+
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, &mergestate->ps);
+ prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
+ node->part_prune_info,
+ validsubplans,
+ list_length(node->mergeplans));
+
+ mergestate->ms_prune_state = prunestate;
+ }
/*
* When no run-time pruning is required and there's at least one
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ else
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
+
}
else
{
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index df0b747883..57f2fce3d4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -94,9 +94,11 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(transientPlan);
COPY_SCALAR_FIELD(dependsOnRole);
COPY_SCALAR_FIELD(parallelModeNeeded);
+ COPY_SCALAR_FIELD(usesPreExecPruning);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(relationRTIs);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -1277,6 +1279,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(contains_init_steps);
+ COPY_SCALAR_FIELD(contains_exec_steps);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index e276264882..a13ee087a8 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -31,7 +31,10 @@ static bool planstate_walk_subplans(List *plans, bool (*walker) (),
void *context);
static bool planstate_walk_members(PlanState **planstates, int nplans,
bool (*walker) (), void *context);
-
+static bool plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context);
+static bool plan_walk_members(List *plans, bool (*walker) (), void *context);
/*
* exprType -
@@ -4105,3 +4108,119 @@ planstate_walk_members(PlanState **planstates, int nplans,
return false;
}
+
+/*
+ * plan_tree_walker --- walk plantrees
+ *
+ * The walker has already visited the current node, and so we need only
+ * recurse into any sub-nodes it has.
+ */
+bool
+plan_tree_walker(Plan *plan,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+
+ /* Guard against stack overflow due to overly complex plan trees */
+ check_stack_depth();
+
+ /* initPlan-s */
+ if (plan_walk_subplans(plan->initPlan, walker, context))
+ return true;
+
+ /* lefttree */
+ if (outerPlan(plan))
+ {
+ if (walker(outerPlan(plan), context))
+ return true;
+ }
+
+ /* righttree */
+ if (innerPlan(plan))
+ {
+ if (walker(innerPlan(plan), context))
+ return true;
+ }
+
+ /* special child plans */
+ switch (nodeTag(plan))
+ {
+ case T_Append:
+ if (plan_walk_members(((Append *) plan)->appendplans,
+ walker, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (plan_walk_members(((MergeAppend *) plan)->mergeplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapAnd:
+ if (plan_walk_members(((BitmapAnd *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapOr:
+ if (plan_walk_members(((BitmapOr *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_SubqueryScan:
+ if (walker(((SubqueryScan *) plan)->subplan, context))
+ return true;
+ break;
+ case T_CustomScan:
+ foreach(lc, ((CustomScan *) plan)->custom_plans)
+ {
+ if (walker((Plan *) lfirst(lc), context))
+ return true;
+ }
+ break;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Walk a list of SubPlans (or initPlans, which also use SubPlan nodes).
+ */
+static bool
+plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+ PlannedStmt *plannedstmt = (PlannedStmt *) context;
+
+ foreach(lc, plans)
+ {
+ SubPlan *sp = lfirst_node(SubPlan, lc);
+ Plan *p = list_nth(plannedstmt->subplans, sp->plan_id - 1);
+
+ if (walker(p, context))
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Walk the constituent plans of a ModifyTable, Append, MergeAppend,
+ * BitmapAnd, or BitmapOr node.
+ */
+static bool
+plan_walk_members(List *plans, bool (*walker) (), void *context)
+{
+ ListCell *lc;
+
+ foreach(lc, plans)
+ {
+ if (walker(lfirst(lc), context))
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 91a89b6d51..8364633d2e 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -312,9 +312,11 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(transientPlan);
WRITE_BOOL_FIELD(dependsOnRole);
WRITE_BOOL_FIELD(parallelModeNeeded);
+ WRITE_BOOL_FIELD(usesPreExecPruning);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(relationRTIs);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -1003,6 +1005,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(contains_init_steps);
+ WRITE_BOOL_FIELD(contains_exec_steps);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -2273,6 +2277,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(subplans);
WRITE_BITMAPSET_FIELD(rewindPlanIDs);
WRITE_NODE_FIELD(finalrtable);
+ WRITE_BITMAPSET_FIELD(relationRTIs);
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index d79af6e56e..df06782c3c 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1585,9 +1585,11 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(transientPlan);
READ_BOOL_FIELD(dependsOnRole);
READ_BOOL_FIELD(parallelModeNeeded);
+ READ_BOOL_FIELD(usesPreExecPruning);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(relationRTIs);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -2533,6 +2535,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(contains_init_steps);
+ READ_BOOL_FIELD(contains_exec_steps);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bd01ec0526..37a07cb258 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -517,8 +517,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->transientPlan = glob->transientPlan;
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
+ result->usesPreExecPruning = glob->usesPreExecPruning;
result->planTree = top_plan;
result->rtable = glob->finalrtable;
+ result->relationRTIs = glob->relationRTIs;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 6ccec759bd..4616dc675d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -483,6 +483,7 @@ static void
add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
{
RangeTblEntry *newrte;
+ Index rti = list_length(glob->finalrtable) + 1;
/* flat copy to duplicate all the scalar fields */
newrte = (RangeTblEntry *) palloc(sizeof(RangeTblEntry));
@@ -517,7 +518,10 @@ add_rte_to_flat_rtable(PlannerGlobal *glob, RangeTblEntry *rte)
* but it would probably cost more cycles than it would save.
*/
if (newrte->rtekind == RTE_RELATION)
+ {
+ glob->relationRTIs = bms_add_member(glob->relationRTIs, rti);
glob->relationOids = lappend_oid(glob->relationOids, newrte->relid);
+ }
}
/*
@@ -1515,6 +1519,9 @@ set_append_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (aplan->part_prune_info->contains_init_steps)
+ root->glob->usesPreExecPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
@@ -1579,6 +1586,9 @@ set_mergeappend_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (mplan->part_prune_info->contains_init_steps)
+ root->glob->usesPreExecPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index e00edbe5c8..d2874f716e 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *contains_init_steps,
+ bool *contains_exec_steps);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +232,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool contains_init_steps = false;
+ bool contains_exec_steps = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +313,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_contains_init_steps,
+ partrel_contains_exec_steps;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_contains_init_steps,
+ &partrel_contains_exec_steps);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!contains_init_steps)
+ contains_init_steps = partrel_contains_init_steps;
+ if (!contains_exec_steps)
+ contains_exec_steps = partrel_contains_exec_steps;
}
pfree(relid_subplan_map);
@@ -337,6 +349,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->contains_init_steps = contains_init_steps;
+ pruneinfo->contains_exec_steps = contains_exec_steps;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +449,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *contains_init_steps and *contains_exec_steps are set to indicate
+ * that the returned PartitionedRelPruneInfos contains pruning steps
+ * that can be performed before and during execution, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *contains_init_steps,
+ bool *contains_exec_steps)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +471,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *contains_init_steps = false;
+ *contains_exec_steps = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +562,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +639,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*contains_init_steps)
+ *contains_init_steps = (initial_pruning_steps != NIL);
+ if (!*contains_exec_steps)
+ *contains_exec_steps = (exec_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -798,6 +829,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +840,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3686,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3709,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 6767eae8f2..6161907ace 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -58,6 +58,7 @@
#include "access/transam.h"
#include "catalog/namespace.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
@@ -99,14 +100,26 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, bool acquire,
+ ParamListInfo boundParams);
+struct GetLockableRelations_context
+{
+ PlannedStmt *plannedstmt;
+ Bitmapset *relations;
+ ParamListInfo params;
+};
+static Bitmapset *GetLockableRelations(PlannedStmt *plannedstmt,
+ ParamListInfo boundParams);
+static bool GetLockableRelations_worker(Plan *plan,
+ struct GetLockableRelations_context *context);
+static Bitmapset *get_plan_scanrelids(Plan *plan);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -792,7 +805,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -826,7 +839,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocks(plan->stmt_list, true, boundParams);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +861,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocks(plan->stmt_list, false, boundParams);
}
/*
@@ -1160,7 +1173,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1366,7 +1379,6 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
foreach(lc, plan->stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc);
- ListCell *lc2;
if (plannedstmt->commandType == CMD_UTILITY)
return false;
@@ -1375,13 +1387,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
* We have to grovel through the rtable because it's likely to contain
* an RTE_RESULT relation, rather than being totally empty.
*/
- foreach(lc2, plannedstmt->rtable)
- {
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
-
- if (rte->rtekind == RTE_RELATION)
- return false;
- }
+ if (!bms_is_empty(plannedstmt->relationRTIs))
+ return false;
}
/*
@@ -1740,14 +1747,15 @@ QueryListGetPrimaryStmt(List *stmts)
* or release them if acquire is false.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, bool acquire, ParamListInfo boundParams)
{
ListCell *lc1;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ Bitmapset *relations;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1765,9 +1773,22 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Fetch the RT indexes of only the relations that will be actually
+ * scanned when the plan is executed. This skips over scan nodes
+ * appearing as child subnodes of any Append/MergeAppend nodes present
+ * in the plan tree. It does so by performing
+ * ExecFindInitialMatchingSubPlans() to run any pruning steps
+ * contained in those nodes that can be safely run at this point, using
+ * 'boundParams' to evaluate any EXTERN parameters contained in the
+ * steps.
+ */
+ relations = GetLockableRelations(plannedstmt, boundParams);
+
+ rti = -1;
+ while ((rti = bms_next_member(relations, rti)) >= 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1786,6 +1807,166 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * GetLockableRelations
+ * Returns set of RT indexes of relations that must be locked by
+ * AcquireExecutorLocks()
+ */
+static Bitmapset *
+GetLockableRelations(PlannedStmt *plannedstmt, ParamListInfo boundParams)
+{
+ ListCell *lc;
+ struct GetLockableRelations_context context;
+
+ /* None of the relation scanning nodes are prunable here. */
+ if (!plannedstmt->usesPreExecPruning)
+ return plannedstmt->relationRTIs;
+
+ /*
+ * Look for prunable nodes in the main plan tree, followed by those in
+ * subplans.
+ */
+ context.plannedstmt = plannedstmt;
+ context.params = boundParams;
+ context.relations = NULL;
+
+ (void) GetLockableRelations_worker(plannedstmt->planTree, &context);
+
+ foreach(lc, plannedstmt->subplans)
+ {
+ Plan *subplan = lfirst(lc);
+
+ (void) GetLockableRelations_worker(subplan, &context);
+ }
+
+ return context.relations;
+}
+
+/*
+ * GetLockableRelations_worker
+ * Adds RT indexes of relations to be scanned by plan to
+ * context->relations
+ *
+ * For plan node types that support pruning, this only adds child plan
+ * subnodes that satisfy the "initial" pruning steps.
+ */
+static bool
+GetLockableRelations_worker(Plan *plan,
+ struct GetLockableRelations_context *context)
+{
+ if (plan == NULL)
+ return false;
+
+ switch(nodeTag(plan))
+ {
+ /* Nodes scanning a relation or relations. */
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ context->relations = bms_add_member(context->relations,
+ ((Scan *) plan)->scanrelid);
+ return false;
+ case T_ForeignScan:
+ context->relations = bms_add_members(context->relations,
+ ((ForeignScan *) plan)->fs_relids);
+ return false;
+ case T_CustomScan:
+ context->relations = bms_add_members(context->relations,
+ ((CustomScan *) plan)->custom_relids);
+ return false;
+
+ /* Nodes containing prunable subnodes. */
+ case T_Append:
+ case T_MergeAppend:
+ {
+ PlannedStmt *plannedstmt = context->plannedstmt;
+ List *rtable = plannedstmt->rtable;
+ ParamListInfo params = context->params;
+ PartitionPruneInfo *pruneinfo;
+ Bitmapset *validsubplans;
+ Bitmapset *parentrelids;
+
+ pruneinfo = IsA(plan, Append) ?
+ ((Append *) plan)->part_prune_info :
+ ((MergeAppend *) plan)->part_prune_info;
+
+ if (pruneinfo && pruneinfo->contains_init_steps)
+ {
+ int i;
+ List *subplans = IsA(plan, Append) ?
+ ((Append *) plan)->appendplans :
+ ((MergeAppend *) plan)->mergeplans;
+
+ validsubplans =
+ ExecFindInitialMatchingSubPlans(pruneinfo,
+ NULL, rtable,
+ params,
+ &parentrelids);
+
+ /* All relevant parents must be locked. */
+ Assert(bms_num_members(parentrelids) > 0);
+ context->relations = bms_add_members(context->relations,
+ parentrelids);
+
+ /* And all leaf partitions that will be scanned. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ context->relations =
+ bms_add_members(context->relations,
+ get_plan_scanrelids(subplan));
+ }
+
+ return false;
+ }
+ }
+ break;
+
+ default:
+ break;
+ }
+
+ return plan_tree_walker(plan, GetLockableRelations_worker,
+ (void *) context);
+}
+
+/*
+ * get_plan_scanrelid
+ * Returns RT indexes of the relation(s) scanned by plan
+ */
+static Bitmapset *
+get_plan_scanrelids(Plan *plan)
+{
+ if (plan == NULL)
+ return NULL;
+
+ switch(nodeTag(plan))
+ {
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ return bms_make_singleton(((Scan *) plan)->scanrelid);
+ case T_ForeignScan:
+ return ((ForeignScan *) plan)->fs_relids;
+ case T_CustomScan:
+ return ((CustomScan *) plan)->custom_relids;
+ default:
+ break;
+ }
+
+ return NULL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 694e38b7dd..0eeaf3e79d 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -90,8 +90,6 @@ typedef struct PartitionPruningData
* These must not be pruned.
* prune_context A short-lived memory context in which to execute the
* partition pruning functions.
- * do_initial_prune true if pruning should be performed during executor
- * startup (at any hierarchy level).
* do_exec_prune true if pruning should be performed during
* executor run (at any hierarchy level).
* num_partprunedata Number of items in "partprunedata" array.
@@ -104,7 +102,6 @@ typedef struct PartitionPruneState
Bitmapset *execparamids;
Bitmapset *other_subplans;
MemoryContext prune_context;
- bool do_initial_prune;
bool do_exec_prune;
int num_partprunedata;
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
@@ -120,9 +117,13 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
+ PartitionPruneInfo *partitionpruneinfo,
+ Bitmapset *initially_valid_subplans,
+ int nsubplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
-extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
+extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneInfo *pruneinfo,
+ EState *estate, List *rtable,
+ ParamListInfo params,
+ Bitmapset **parentrelids);
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 03a346c01d..8b985a4706 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -158,5 +158,8 @@ extern bool raw_expression_tree_walker(Node *node, bool (*walker) (),
struct PlanState;
extern bool planstate_tree_walker(struct PlanState *planstate, bool (*walker) (),
void *context);
+struct Plan;
+extern bool plan_tree_walker(struct Plan *plan, bool (*walker) (),
+ void *context);
#endif /* NODEFUNCS_H */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 324d92880b..d041b4d924 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -101,6 +101,9 @@ typedef struct PlannerGlobal
List *finalrtable; /* "flat" rangetable for executor */
+ Bitmapset *relationRTIs; /* Indexes of RTE_RELATION entries in range
+ * table */
+
List *finalrowmarks; /* "flat" list of PlanRowMarks */
List *resultRelations; /* "flat" list of integer RT indexes */
@@ -129,6 +132,9 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
+ bool usesPreExecPruning; /* Do some Plan nodes use pre-execution
+ * partition pruning */
+
PartitionDirectory partition_directory; /* partition descriptors */
} PlannerGlobal;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index be3c30704a..23bf04578b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -59,12 +59,18 @@ typedef struct PlannedStmt
bool parallelModeNeeded; /* parallel mode required to execute? */
+ bool usesPreExecPruning; /* Do some Plan nodes use pre-execution
+ * partition pruning */
+
int jitFlags; /* which forms of JIT should be performed */
struct Plan *planTree; /* tree of Plan nodes */
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *relationRTIs; /* Indexes of RTE_RELATION entries in range
+ * table */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1157,6 +1163,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * contains_init_steps Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * contains_exec_steps Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1165,6 +1178,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool contains_init_steps;
+ bool contains_exec_steps;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 5f51e73a4d..1c9c408f00 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,8 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use during pre-execution pruning; planstate
+ * would be NULL in that case.
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +58,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
--
2.24.1
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2021-12-28 13:12 Ashutosh Bapat <[email protected]>
parent: Amit Langote <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Ashutosh Bapat @ 2021-12-28 13:12 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: pgsql-hackers
On Sat, Dec 25, 2021 at 9:06 AM Amit Langote <[email protected]> wrote:
>
> Executing generic plans involving partitions is known to become slower
> as partition count grows due to a number of bottlenecks, with
> AcquireExecutorLocks() showing at the top in profiles.
>
> Previous attempt at solving that problem was by David Rowley [1],
> where he proposed delaying locking of *all* partitions appearing under
> an Append/MergeAppend until "initial" pruning is done during the
> executor initialization phase. A problem with that approach that he
> has described in [2] is that leaving partitions unlocked can lead to
> race conditions where the Plan node belonging to a partition can be
> invalidated when a concurrent session successfully alters the
> partition between AcquireExecutorLocks() saying the plan is okay to
> execute and then actually executing it.
>
> However, using an idea that Robert suggested to me off-list a little
> while back, it seems possible to determine the set of partitions that
> we can safely skip locking. The idea is to look at the "initial" or
> "pre-execution" pruning instructions contained in a given Append or
> MergeAppend node when AcquireExecutorLocks() is collecting the
> relations to lock and consider relations from only those sub-nodes
> that survive performing those instructions. I've attempted
> implementing that idea in the attached patch.
>
In which cases, we will have "pre-execution" pruning instructions that
can be used to skip locking partitions? Can you please give a few
examples where this approach will be useful?
The benchmark is showing good results, indeed.
--
Best Wishes,
Ashutosh Bapat
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2021-12-31 02:26 Amit Langote <[email protected]>
parent: Ashutosh Bapat <[email protected]>
0 siblings, 0 replies; 82+ messages in thread
From: Amit Langote @ 2021-12-31 02:26 UTC (permalink / raw)
To: Ashutosh Bapat <[email protected]>; +Cc: pgsql-hackers
On Tue, Dec 28, 2021 at 22:12 Ashutosh Bapat <[email protected]>
wrote:
> On Sat, Dec 25, 2021 at 9:06 AM Amit Langote <[email protected]>
> wrote:
> >
> > Executing generic plans involving partitions is known to become slower
> > as partition count grows due to a number of bottlenecks, with
> > AcquireExecutorLocks() showing at the top in profiles.
> >
> > Previous attempt at solving that problem was by David Rowley [1],
> > where he proposed delaying locking of *all* partitions appearing under
> > an Append/MergeAppend until "initial" pruning is done during the
> > executor initialization phase. A problem with that approach that he
> > has described in [2] is that leaving partitions unlocked can lead to
> > race conditions where the Plan node belonging to a partition can be
> > invalidated when a concurrent session successfully alters the
> > partition between AcquireExecutorLocks() saying the plan is okay to
> > execute and then actually executing it.
> >
> > However, using an idea that Robert suggested to me off-list a little
> > while back, it seems possible to determine the set of partitions that
> > we can safely skip locking. The idea is to look at the "initial" or
> > "pre-execution" pruning instructions contained in a given Append or
> > MergeAppend node when AcquireExecutorLocks() is collecting the
> > relations to lock and consider relations from only those sub-nodes
> > that survive performing those instructions. I've attempted
> > implementing that idea in the attached patch.
> >
>
> In which cases, we will have "pre-execution" pruning instructions that
> can be used to skip locking partitions? Can you please give a few
> examples where this approach will be useful?
This is mainly to be useful for prepared queries, so something like:
prepare q as select * from partitioned_table where key = $1;
And that too when execute q(…) uses a generic plan. Generic plans are
problematic because it must contain nodes for all partitions (without any
plan time pruning), which means CheckCachedPlan() has to spend time
proportional to the number of partitions to determine that the plan is
still usable / has not been invalidated; most of that is
AcquireExecutorLocks().
Other bottlenecks, not addressed in this patch, pertain to some executor
startup/shutdown subroutines that process the range table of a PlannedStmt
in its entirety, whose length is also proportional to the number of
partitions when the plan is generic.
The benchmark is showing good results, indeed.
Thanks.
--
Amit Langote
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-03-28 07:17 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-03-28 07:17 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; pgsql-hackers; David Rowley *EXTERN* <[email protected]>
On Tue, Mar 22, 2022 at 9:44 PM Amit Langote <[email protected]> wrote:
> On Tue, Mar 15, 2022 at 3:19 PM Amit Langote <[email protected]> wrote:
> > On Tue, Mar 15, 2022 at 5:06 AM Robert Haas <[email protected]> wrote:
> > > On Mon, Mar 14, 2022 at 3:38 PM Tom Lane <[email protected]> wrote:
> > > > Also, while I've not spent much time at all reading this patch,
> > > > it seems rather desperately undercommented, and a lot of the
> > > > new names are unintelligible. In particular, I suspect that the
> > > > patch is significantly redesigning when/where run-time pruning
> > > > happens (unless it's just letting that be run twice); but I don't
> > > > see any documentation or name changes suggesting where that
> > > > responsibility is now.
> > >
> > > I am sympathetic to that concern. I spent a while staring at a
> > > baffling comment in 0001 only to discover it had just been moved from
> > > elsewhere. I really don't feel that things in this are as clear as
> > > they could be -- although I hasten to add that I respect the people
> > > who have done work in this area previously and am grateful for what
> > > they did. It's been a huge benefit to the project in spite of the
> > > bumps in the road. Moreover, this isn't the only code in PostgreSQL
> > > that needs improvement, or the worst. That said, I do think there are
> > > problems. I don't yet have a position on whether this patch is making
> > > that better or worse.
> >
> > Okay, I'd like to post a new version with the comments edited to make
> > them a bit more intelligible. I understand that the comments around
> > the new invocation mode(s) of runtime pruning are not as clear as they
> > should be, especially as the changes that this patch wants to make to
> > how things work are not very localized.
>
> Actually, another area where the comments may not be as clear as they
> should have been is the changes that the patch makes to the
> AcquireExecutorLocks() logic that decides which relations are locked
> to safeguard the plan tree for execution, which are those given by
> RTE_RELATION entries in the range table.
>
> Without the patch, they are found by actually scanning the range table.
>
> With the patch, it's the same set of RTEs if the plan doesn't contain
> any pruning nodes, though instead of the range table, what is scanned
> is a bitmapset of their RT indexes that is made available by the
> planner in the form of PlannedStmt.lockrels. When the plan does
> contain a pruning node (PlannedStmt.containsInitialPruning), the
> bitmapset is constructed by calling ExecutorGetLockRels() on the plan
> tree, which walks it to add RT indexes of relations mentioned in the
> Scan nodes, while skipping any nodes that are pruned after performing
> initial pruning steps that may be present in their containing parent
> node's PartitionPruneInfo. Also, the RT indexes of partitioned tables
> that are present in the PartitionPruneInfo itself are also added to
> the set.
>
> While expanding comments added by the patch to make this clear, I
> realized that there are two problems, one of them quite glaring:
>
> * Planner's constructing this bitmapset and its copying along with the
> PlannedStmt is pure overhead in the cases that this patch has nothing
> to do with, which is the kind of thing that Andres cautioned against
> upthread.
>
> * Not all partitioned tables that would have been locked without the
> patch to come up with a Append/MergeAppend plan may be returned by
> ExecutorGetLockRels(). For example, if none of the query's
> runtime-prunable quals were found to match the partition key of an
> intermediate partitioned table and thus that partitioned table not
> included in the PartitionPruneInfo. Or if an Append/MergeAppend
> covering a partition tree doesn't contain any PartitionPruneInfo to
> begin with, in which case, only the leaf partitions and none of
> partitioned parents would be accounted for by the
> ExecutorGetLockRels() logic.
>
> The 1st one seems easy to fix by not inventing PlannedStmt.lockrels
> and just doing what's being done now: scan the range table if
> (!PlannedStmt.containsInitialPruning).
The attached updated patch does it like this.
> The only way perhaps to fix the second one is to reconsider the
> decision we made in the following commit:
>
> commit 52ed730d511b7b1147f2851a7295ef1fb5273776
> Author: Tom Lane <[email protected]>
> Date: Sun Oct 7 14:33:17 2018 -0400
>
> Remove some unnecessary fields from Plan trees.
>
> In the wake of commit f2343653f, we no longer need some fields that
> were used before to control executor lock acquisitions:
>
> * PlannedStmt.nonleafResultRelations can go away entirely.
>
> * partitioned_rels can go away from Append, MergeAppend, and ModifyTable.
> However, ModifyTable still needs to know the RT index of the partition
> root table if any, which was formerly kept in the first entry of that
> list. Add a new field "rootRelation" to remember that. rootRelation is
> partly redundant with nominalRelation, in that if it's set it will have
> the same value as nominalRelation. However, the latter field has a
> different purpose so it seems best to keep them distinct.
>
> That is, add back the partitioned_rels field, at least to Append and
> MergeAppend, to store the RT indexes of partitioned tables whose
> children's paths are present in Append/MergeAppend.subpaths.
And implemented this in the attached 0002 that reintroduces
partitioned_rels in Append/MergeAppend nodes as a bitmapset of RT
indexes. The set contains the RT indexes of partitioned ancestors
whose expansion produced the leaf partitions that a given
Append/MergeAppend node scans. This project needs this way of
knowing the partitioned tables involved in producing an
Append/MergeAppend node, because we'd like to give plancache.c the
ability to glean the set of relations to be locked by scanning a plan
tree to make the tree ready for execution rather than by scanning the
range table and the only relations we're missing in the tree right now
are partitioned tables.
One fly-in-the-ointment situation I faced when doing that is the fact
that setrefs.c in most situations removes the Append/MergeAppend from
the final plan if it contains only one child subplan. I got around it
by inventing a PlannerGlobal/PlannedStmt.elidedAppendPartedRels set
which is a union of partitioned_rels of all the Append/MergeAppend
nodes in the plan tree that were removed as described.
Other than the changes mentioned above, the updated patch now contains
a bit more commentary than earlier versions, mostly around
AcquireExecutorLocks()'s new way of determining the set of relations
to lock and the significantly redesigned working of the "initial"
execution pruning.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/x-patch] v6-0003-Add-a-plan_tree_walker.patch (3.9K, 2-v6-0003-Add-a-plan_tree_walker.patch)
download | inline diff:
From 47a00a6b8cf695e5890fc6555e2df2980eb2115b Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Thu, 3 Mar 2022 16:04:13 +0900
Subject: [PATCH v6 3/4] Add a plan_tree_walker()
Like planstate_tree_walker() but for uninitialized plan trees.
---
src/backend/nodes/nodeFuncs.c | 116 ++++++++++++++++++++++++++++++++++
src/include/nodes/nodeFuncs.h | 3 +
2 files changed, 119 insertions(+)
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index ec25aae6e3..c16f9c6b40 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -31,6 +31,10 @@ static bool planstate_walk_subplans(List *plans, bool (*walker) (),
void *context);
static bool planstate_walk_members(PlanState **planstates, int nplans,
bool (*walker) (), void *context);
+static bool plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context);
+static bool plan_walk_members(List *plans, bool (*walker) (), void *context);
/*
@@ -4150,3 +4154,115 @@ planstate_walk_members(PlanState **planstates, int nplans,
return false;
}
+
+/*
+ * plan_tree_walker --- walk plantrees
+ *
+ * The walker has already visited the current node, and so we need only
+ * recurse into any sub-nodes it has.
+ */
+bool
+plan_tree_walker(Plan *plan,
+ bool (*walker) (),
+ void *context)
+{
+ /* Guard against stack overflow due to overly complex plan trees */
+ check_stack_depth();
+
+ /* initPlan-s */
+ if (plan_walk_subplans(plan->initPlan, walker, context))
+ return true;
+
+ /* lefttree */
+ if (outerPlan(plan))
+ {
+ if (walker(outerPlan(plan), context))
+ return true;
+ }
+
+ /* righttree */
+ if (innerPlan(plan))
+ {
+ if (walker(innerPlan(plan), context))
+ return true;
+ }
+
+ /* special child plans */
+ switch (nodeTag(plan))
+ {
+ case T_Append:
+ if (plan_walk_members(((Append *) plan)->appendplans,
+ walker, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (plan_walk_members(((MergeAppend *) plan)->mergeplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapAnd:
+ if (plan_walk_members(((BitmapAnd *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapOr:
+ if (plan_walk_members(((BitmapOr *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_CustomScan:
+ if (plan_walk_members(((CustomScan *) plan)->custom_plans,
+ walker, context))
+ return true;
+ break;
+ case T_SubqueryScan:
+ if (walker(((SubqueryScan *) plan)->subplan, context))
+ return true;
+ break;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Walk a list of SubPlans (or initPlans, which also use SubPlan nodes).
+ */
+static bool
+plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+ PlannedStmt *plannedstmt = (PlannedStmt *) context;
+
+ foreach(lc, plans)
+ {
+ SubPlan *sp = lfirst_node(SubPlan, lc);
+ Plan *p = list_nth(plannedstmt->subplans, sp->plan_id - 1);
+
+ if (walker(p, context))
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Walk the constituent plans of a ModifyTable, Append, MergeAppend,
+ * BitmapAnd, or BitmapOr node.
+ */
+static bool
+plan_walk_members(List *plans, bool (*walker) (), void *context)
+{
+ ListCell *lc;
+
+ foreach(lc, plans)
+ {
+ if (walker(lfirst(lc), context))
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 93c60bde66..fca107ad65 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -158,5 +158,8 @@ extern bool raw_expression_tree_walker(Node *node, bool (*walker) (),
struct PlanState;
extern bool planstate_tree_walker(struct PlanState *planstate, bool (*walker) (),
void *context);
+struct Plan;
+extern bool plan_tree_walker(struct Plan *plan, bool (*walker) (),
+ void *context);
#endif /* NODEFUNCS_H */
--
2.24.1
[application/x-patch] v6-0002-Add-Merge-Append.partitioned_rels.patch (17.4K, 3-v6-0002-Add-Merge-Append.partitioned_rels.patch)
download | inline diff:
From 8c81237402922ebf82786f3ff34972a6a3cb8c03 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Thu, 24 Mar 2022 22:47:03 +0900
Subject: [PATCH v6 2/4] Add [Merge]Append.partitioned_rels
To record the RT indexes of all partitioned ancestors leading up to
leaf partitions that are appended by the node.
If a given [Merge]Append node is left out from the plan due to there
being only one element in its list of child subplans, then its
partitioned_rels set is added to PlannerGlobal.elidedAppendPartedRels
that is passed down to the executor through PlannedStmt.
There are no users for partitioned_rels and elidedAppendPartedRels
as of this commit, though a later commit will require the ability
to extract the set of relations that must be locked to make a plan
tree safe for execution by walking the plan tree itself, so having
the partitioned tables be also present in the plan tree will be
helpful. Note that currently the executor relies on the fact that
the set of relations to be locked can be obtained by simply scanning
the range table that's made available in PlannedStmt along with the
plan tree.
---
src/backend/nodes/copyfuncs.c | 3 +++
src/backend/nodes/outfuncs.c | 5 +++++
src/backend/nodes/readfuncs.c | 3 +++
src/backend/optimizer/path/joinrels.c | 9 ++++++++
src/backend/optimizer/plan/createplan.c | 18 +++++++++++++++-
src/backend/optimizer/plan/planner.c | 8 +++++++
src/backend/optimizer/plan/setrefs.c | 28 +++++++++++++++++++++++++
src/backend/optimizer/util/inherit.c | 16 ++++++++++++++
src/backend/optimizer/util/relnode.c | 20 ++++++++++++++++++
src/include/nodes/pathnodes.h | 22 +++++++++++++++++++
src/include/nodes/plannodes.h | 17 +++++++++++++++
11 files changed, 148 insertions(+), 1 deletion(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 55f720a88f..dc68a12486 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -106,6 +106,7 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_NODE_FIELD(invalItems);
COPY_NODE_FIELD(paramExecTypes);
COPY_NODE_FIELD(utilityStmt);
+ COPY_BITMAPSET_FIELD(elidedAppendPartedRels);
COPY_LOCATION_FIELD(stmt_location);
COPY_SCALAR_FIELD(stmt_len);
@@ -253,6 +254,7 @@ _copyAppend(const Append *from)
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
COPY_NODE_FIELD(part_prune_info);
+ COPY_BITMAPSET_FIELD(partitioned_rels);
return newnode;
}
@@ -281,6 +283,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
COPY_NODE_FIELD(part_prune_info);
+ COPY_BITMAPSET_FIELD(partitioned_rels);
return newnode;
}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6bdad462c7..bc178d53bf 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -324,6 +324,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
WRITE_NODE_FIELD(utilityStmt);
+ WRITE_BITMAPSET_FIELD(elidedAppendPartedRels);
WRITE_LOCATION_FIELD(stmt_location);
WRITE_INT_FIELD(stmt_len);
}
@@ -443,6 +444,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
WRITE_NODE_FIELD(part_prune_info);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
@@ -460,6 +462,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
WRITE_NODE_FIELD(part_prune_info);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
@@ -2288,6 +2291,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_BOOL_FIELD(parallelModeOK);
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_CHAR_FIELD(maxParallelHazard);
+ WRITE_BITMAPSET_FIELD(elidedAppendPartedRels);
}
static void
@@ -2399,6 +2403,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_BOOL_FIELD(partbounds_merged);
WRITE_BITMAPSET_FIELD(live_parts);
WRITE_BITMAPSET_FIELD(all_partrels);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3f68f7c18d..3c673c42d5 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1597,6 +1597,7 @@ _readPlannedStmt(void)
READ_NODE_FIELD(invalItems);
READ_NODE_FIELD(paramExecTypes);
READ_NODE_FIELD(utilityStmt);
+ READ_BITMAPSET_FIELD(elidedAppendPartedRels);
READ_LOCATION_FIELD(stmt_location);
READ_INT_FIELD(stmt_len);
@@ -1719,6 +1720,7 @@ _readAppend(void)
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
READ_NODE_FIELD(part_prune_info);
+ READ_BITMAPSET_FIELD(partitioned_rels);
READ_DONE();
}
@@ -1741,6 +1743,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
READ_NODE_FIELD(part_prune_info);
+ READ_BITMAPSET_FIELD(partitioned_rels);
READ_DONE();
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 9da3ff2f9a..e74d40fee3 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1549,6 +1549,15 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
child_restrictlist);
+
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * joinrel's set.
+ */
+ joinrel->partitioned_rels =
+ bms_add_members(joinrel->partitioned_rels,
+ child_joinrel->partitioned_rels);
}
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index fa069a217c..0026086591 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -26,10 +26,12 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
#include "optimizer/paramassign.h"
+#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
@@ -1331,11 +1333,11 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
best_path->subpaths,
prunequal);
}
-
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
plan->part_prune_info = partpruneinfo;
+ plan->partitioned_rels = bms_copy(rel->partitioned_rels);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1499,6 +1501,20 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
node->mergeplans = subplans;
node->part_prune_info = partpruneinfo;
+ /*
+ * We need to explicitly add to the plan node the RT indexes of any
+ * partitioned tables whose partitions will be scanned by the nodes in
+ * 'subplans'. There can be multiple RT indexes in the set due to the
+ * partition tree being multi-level and/or this being a plan for UNION ALL
+ * over multiple partition trees. Along with scanrelids of leaf-level Scan
+ * nodes, this allows the executor to lock the full set of relations being
+ * scanned by this node.
+ *
+ * Note that 'apprelids' only contains the top-level base relation(s), so
+ * is not sufficient for the purpose.
+ */
+ node->partitioned_rels = bms_copy(rel->partitioned_rels);
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
* produce either the exact tlist or a narrow tlist, we should get rid of
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bd09f85aea..374a9d9753 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -529,6 +529,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->paramExecTypes = glob->paramExecTypes;
/* utilityStmt should be null, but we might as well copy it */
result->utilityStmt = parse->utilityStmt;
+ result->elidedAppendPartedRels = glob->elidedAppendPartedRels;
result->stmt_location = parse->stmt_location;
result->stmt_len = parse->stmt_len;
@@ -7365,6 +7366,13 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, grouped_rel, grouped_live_children);
}
+
+ /*
+ * Input rel might be a partitioned appendrel, though grouped_rel has at
+ * this point taken its role as the an appendrel owning the former's
+ * children, so copy the former's partitioned_rels set into the latter.
+ */
+ grouped_rel->partitioned_rels = bms_copy(input_rel->partitioned_rels);
}
/*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index a7b11b7f03..dbdeb8ec9d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1512,6 +1512,10 @@ set_append_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /* Fix up partitioned_rels before possibly removing the Append below. */
+ aplan->partitioned_rels = offset_relid_set(aplan->partitioned_rels,
+ rtoffset);
+
/*
* See if it's safe to get rid of the Append entirely. For this to be
* safe, there must be only one child plan and that child plan's parallel
@@ -1522,8 +1526,17 @@ set_append_references(PlannerInfo *root,
*/
if (list_length(aplan->appendplans) == 1 &&
((Plan *) linitial(aplan->appendplans))->parallel_aware == aplan->plan.parallel_aware)
+ {
+ /*
+ * Partitioned table involved, if any, must be made known to the
+ * executor.
+ */
+ root->glob->elidedAppendPartedRels =
+ bms_add_members(root->glob->elidedAppendPartedRels,
+ aplan->partitioned_rels);
return clean_up_removed_plan_level((Plan *) aplan,
(Plan *) linitial(aplan->appendplans));
+ }
/*
* Otherwise, clean up the Append as needed. It's okay to do this after
@@ -1584,6 +1597,12 @@ set_mergeappend_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /*
+ * Fix up partitioned_rels before possibly removing the MergeAppend below.
+ */
+ mplan->partitioned_rels = offset_relid_set(mplan->partitioned_rels,
+ rtoffset);
+
/*
* See if it's safe to get rid of the MergeAppend entirely. For this to
* be safe, there must be only one child plan and that child plan's
@@ -1594,8 +1613,17 @@ set_mergeappend_references(PlannerInfo *root,
*/
if (list_length(mplan->mergeplans) == 1 &&
((Plan *) linitial(mplan->mergeplans))->parallel_aware == mplan->plan.parallel_aware)
+ {
+ /*
+ * Partitioned tables involved, if any, must be made known to the
+ * executor.
+ */
+ root->glob->elidedAppendPartedRels =
+ bms_add_members(root->glob->elidedAppendPartedRels,
+ mplan->partitioned_rels);
return clean_up_removed_plan_level((Plan *) mplan,
(Plan *) linitial(mplan->mergeplans));
+ }
/*
* Otherwise, clean up the MergeAppend as needed. It's okay to do this
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 7e134822f3..56912e4101 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -406,6 +406,14 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
childrte, childRTindex,
childrel, top_parentrc, lockmode);
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * rel's set.
+ */
+ relinfo->partitioned_rels = bms_add_members(relinfo->partitioned_rels,
+ childrelinfo->partitioned_rels);
+
/* Close child relation, but keep locks */
table_close(childrel, NoLock);
}
@@ -737,6 +745,14 @@ expand_appendrel_subquery(PlannerInfo *root, RelOptInfo *rel,
/* Child may itself be an inherited rel, either table or subquery. */
if (childrte->inh)
expand_inherited_rtentry(root, childrel, childrte, childRTindex);
+
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * rel's set.
+ */
+ rel->partitioned_rels = bms_add_members(rel->partitioned_rels,
+ childrel->partitioned_rels);
}
}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba..1d082a8fdd 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -361,6 +361,10 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
}
}
+ /* A partitioned appendrel. */
+ if (rel->part_scheme != NULL)
+ rel->partitioned_rels = bms_copy(rel->relids);
+
/* Save the finished struct in the query's simple_rel_array */
root->simple_rel_array[relid] = rel;
@@ -729,6 +733,14 @@ build_join_rel(PlannerInfo *root,
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
sjinfo, restrictlist);
+ /*
+ * The joinrel may get processed as an appendrel via partitionwise join
+ * if both outer and inner rels are partitioned, so set partitioned_rels
+ * appropriately.
+ */
+ joinrel->partitioned_rels = bms_union(outer_rel->partitioned_rels,
+ inner_rel->partitioned_rels);
+
/*
* Set the consider_parallel flag if this joinrel could potentially be
* scanned within a parallel worker. If this flag is false for either
@@ -897,6 +909,14 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
sjinfo, restrictlist);
+ /*
+ * The joinrel may get processed as an appendrel via partitionwise join
+ * if both outer and inner rels are partitioned, so set partitioned_rels
+ * appropriately.
+ */
+ joinrel->partitioned_rels = bms_union(outer_rel->partitioned_rels,
+ inner_rel->partitioned_rels);
+
/* We build the join only once. */
Assert(!find_join_rel(root, joinrel->relids));
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1f3845b3fe..5327d9ba8b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -130,6 +130,11 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
PartitionDirectory partition_directory; /* partition descriptors */
+
+ Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
+ * single-subplan [Merge]Append nodes
+ * that have been removed fron the
+ * various plan trees. */
} PlannerGlobal;
/* macro for fetching the Plan associated with a SubPlan node */
@@ -773,6 +778,23 @@ typedef struct RelOptInfo
Relids all_partrels; /* Relids set of all partition relids */
List **partexprs; /* Non-nullable partition key expressions */
List **nullable_partexprs; /* Nullable partition key expressions */
+
+ /*
+ * For an appendrel parent relation (base, join, or upper) that is
+ * partitioned, this stores the RT indexes of all the paritioned ancestors
+ * including itself that lead up to the individual leaf partitions that
+ * will be scanned to produce this relation's output rows. The relid set
+ * is copied into the resulting Append or MergeAppend plan node for
+ * allowing the executor to take appropriate locks on those relations,
+ * unless the node is deemed useless in setrefs.c due to having a single
+ * leaf subplan and thus elided from the final plan, in which case, the set
+ * is added into PlannerGlobal.elidedAppendPartedRels.
+ *
+ * Note that 'apprelids' of those nodes only contains the top-level base
+ * relation(s), so is not sufficient for said purpose.
+ */
+
+ Bitmapset *partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0b518ce6b2..bd87c35d6c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -85,6 +85,11 @@ typedef struct PlannedStmt
Node *utilityStmt; /* non-null if this is utility stmt */
+ Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
+ * single-subplan [Merge]Append nodes
+ * that have been removed from the
+ * various plan trees. */
+
/* statement location in source string (copied from Query) */
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
@@ -261,6 +266,12 @@ typedef struct Append
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * RT indexes of all partitioned parents whose partitions' plans are
+ * present in appendplans.
+ */
+ Bitmapset *partitioned_rels;
} Append;
/* ----------------
@@ -281,6 +292,12 @@ typedef struct MergeAppend
bool *nullsFirst; /* NULLS FIRST/LAST directions */
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * RT indexes of all partitioned parents whose partitions' plans are
+ * present in appendplans.
+ */
+ Bitmapset *partitioned_rels;
} MergeAppend;
/* ----------------
--
2.24.1
[application/x-patch] v6-0004-Optimize-AcquireExecutorLocks-to-skip-pruned-part.patch (94.2K, 4-v6-0004-Optimize-AcquireExecutorLocks-to-skip-pruned-part.patch)
download | inline diff:
From 5e076f58274f6cd05afc8533af130e165c9b862e Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v6 4/4] Optimize AcquireExecutorLocks() to skip pruned
partitions
Instead of locking all relations listed in the range table in the
cases where the PlannedStmt indicates that some nodes in the plan
tree can do partition pruning without depending on execution having
started (so called "initial" pruning), AcquireExecutorLocks() now
calls the new executor function ExecutorGetLockRels() which returns
a set of relations (their RT indexes) to be locked not including
those scanned by the subplans that pruned.
The result of pruning done this way must be remembered and reused
during actual execution of the plan, which is done by creating a
PlanInitPruningOutput nodes for for each plan node that undergoes
pruning and a set of those for the whole plan tree are added to
ExecLockRelsInfo which also stores the bitmapset of RT indexes of
relations that are actually locked by AcquireExecutorLocks().
ExecLockRelsInfos are passed down the executor alongside the
PlannedStmts. This arrangement ensures that the executor doesn't
accidentally try to process a plan tree subnodes that has been
deemed pruned by AcquireExecutorLocks().
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 13 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 17 +-
src/backend/executor/README | 24 +++
src/backend/executor/execMain.c | 202 ++++++++++++++++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 224 ++++++++++++++++++----
src/backend/executor/execUtils.c | 8 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 52 ++++-
src/backend/executor/nodeMergeAppend.c | 52 ++++-
src/backend/executor/nodeModifyTable.c | 25 +++
src/backend/executor/spi.c | 14 +-
src/backend/nodes/copyfuncs.c | 49 ++++-
src/backend/nodes/outfuncs.c | 39 ++++
src/backend/nodes/readfuncs.c | 37 ++++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 6 +
src/backend/partitioning/partprune.c | 37 +++-
src/backend/tcop/postgres.c | 15 +-
src/backend/tcop/pquery.c | 21 ++-
src/backend/utils/cache/plancache.c | 252 ++++++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execPartition.h | 2 +
src/include/executor/execdesc.h | 2 +
src/include/executor/executor.h | 2 +
src/include/executor/nodeAppend.h | 1 +
src/include/executor/nodeMergeAppend.h | 1 +
src/include/executor/nodeModifyTable.h | 1 +
src/include/nodes/execnodes.h | 96 ++++++++++
src/include/nodes/nodes.h | 5 +
src/include/nodes/pathnodes.h | 4 +
src/include/nodes/plannodes.h | 15 ++
src/include/tcop/tcopprot.h | 2 +-
src/include/utils/plancache.h | 6 +
src/include/utils/portal.h | 5 +
41 files changed, 1174 insertions(+), 104 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 55c38b04c4..d403eb2309 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -542,7 +542,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9f632285b6..1f1a44b9bb 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, execlockrelsinfo, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1013790dbb..008b8ce0e9 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -741,8 +741,10 @@ execute_sql_string(const char *sql)
RawStmt *parsetree = lfirst_node(RawStmt, lc1);
MemoryContext per_parsetree_context,
oldcontext;
- List *stmt_list;
- ListCell *lc2;
+ List *stmt_list,
+ *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
/*
* We do the work for each parsetree in a short-lived context, to
@@ -762,11 +764,13 @@ execute_sql_string(const char *sql)
NULL,
0,
NULL);
- stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL);
+ stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL,
+ &execlockrelsinfo_list);
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
CommandCounterIncrement();
@@ -777,6 +781,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ execlockrelsinfo,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 05e7b60059..4ef44aaf23 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -416,7 +416,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 9902c5c566..85e73ddded 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL), /* no ExecLockRelsInfo to pass */
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..bbbf8bbcbd 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *plan_execlockrelsinfo_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -195,6 +196,7 @@ ExecuteQuery(ParseState *pstate,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* DO NOT add any logic that could possibly throw an error between
@@ -204,7 +206,7 @@ ExecuteQuery(ParseState *pstate,
NULL,
query_string,
entry->plansource->commandTag,
- plan_list,
+ plan_list, plan_execlockrelsinfo_list,
cplan);
/*
@@ -576,7 +578,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *plan_execlockrelsinfo_list;
+ ListCell *p,
+ *pe;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -632,15 +636,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pe, plan_execlockrelsinfo_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, pe);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, execlockrelsinfo, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index bf5e70860d..9720d0ac2c 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,27 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+proceeds by locking all the relations that will be scanned by that plan. If
+the generic plan has nodes that contain so-called initial pruning steps (a
+subset of execution pruning steps that do not depend on full-fledged execution
+having started), they are performed at this point to figure out the minimal
+set of child subplans that satisfy those pruning instructions and the result
+of performing that pruning is saved in a data structure that gets passed to
+the executor alongside the plan tree. Relations scanned by only those
+surviving subplans are then locked while those scanned by the pruned subplans
+are not, even though the pruned subplans themselves are not removed from the
+plan tree. So, it is imperative that the executor and any third party code
+invoked by it that gets passed the plan tree look at the initial pruning result
+made available via the aforementioned data structure to determine whether or
+not a particular subplan is valid. (The data structure basically consists of
+an array of PlanInitPruningOutput nodes containing one element for each node
+of the plan tree indexable using plan_node_id of the individual plan nodes,
+where each node contains a bitmapset of indexes of unpruned child subplans of
+a given node.)
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -247,6 +268,9 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorGetLockRels ] --- an optional step to walk over the plan tree
+ to produce an ExecLockRelsInfo to be passed to CreateQueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 473d2e00a2..1ddd1dfb83 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,11 +49,15 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/nodeAppend.h"
+#include "executor/nodeMergeAppend.h"
+#include "executor/nodeModifyTable.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
#include "parser/parsetree.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -101,9 +105,205 @@ static char *ExecBuildSlotValueDescription(Oid reloid,
Bitmapset *modifiedCols,
int maxfieldlen);
static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
+static bool ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorGetLockRels
+ *
+ * Figure out the minimal set of relations to lock to be able to safely
+ * execute a given plan
+ *
+ * This ignores the relations scanned by child subplans that are pruned away
+ * after performing initial pruning steps present in the plan using the
+ * provided set of EXTERN parameters.
+ *
+ * Along with the set of RT indexes of relations that must be locked, the
+ * returned struct also contains an array of PlanInitPruningOutput nodes each
+ * of which contains the result of initial pruning for a given plan node, which
+ * is basically a bitmapset of the indexes of surviving child subplans. Each
+ * plan node in the tree that undergoes pruning will have an element in the
+ * array.
+ *
+ * Note that while relations scanned by the subplans that are pruned will not
+ * be locked, the subplans themselves are left as-is in the plan tree, assuming
+ * anything that reads the plan tree during execution knows to ignore them by
+ * looking at the PlanInitPruningOutput's list of valid subplans.
+ *
+ * Partitioned tables mentioned in PartitionedRelPruneInfo nodes that drive
+ * the pruning will be locked before doing the pruning and also added to the
+ * the returned set.
+ */
+ExecLockRelsInfo *
+ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ int numPlanNodes = plannedstmt->numPlanNodes;
+ ExecGetLockRelsContext context;
+ ExecLockRelsInfo *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ context.stmt = plannedstmt;
+ context.params = params;
+
+ /*
+ * Go walk all the plan tree(s) present in the PlannedStmt, filling
+ * context.lockrels with only the relations from plan nodes that
+ * survive initial pruning and also the tables mentioned in
+ * partitioned_rels sets found in the plan.
+ */
+ context.lockrels = NULL;
+ context.initPruningOutputs = NIL;
+ context.ipoIndexes = palloc0(sizeof(int) * numPlanNodes);
+
+ /* All the subplans. */
+ foreach(lc, plannedstmt->subplans)
+ {
+ Plan *subplan = lfirst(lc);
+
+ (void) ExecGetLockRels(subplan, &context);
+ }
+
+ /* And the main tree. */
+ (void) ExecGetLockRels(plannedstmt->planTree, &context);
+
+ /*
+ * Also be sure to lock partitioned relations from any [Merge]Append nodes
+ * that were originally present but were ultimately left out from the plan
+ * due to being deemed no-op nodes.
+ */
+ context.lockrels = bms_add_members(context.lockrels,
+ plannedstmt->elidedAppendPartedRels);
+
+ result = makeNode(ExecLockRelsInfo);
+ result->lockrels = context.lockrels;
+ result->numPlanNodes = numPlanNodes;
+ result->initPruningOutputs = context.initPruningOutputs;
+ result->ipoIndexes = context.ipoIndexes;
+
+ return result;
+}
+
+/* ------------------------------------------------------------------------
+ * ExecGetLockRels
+ * Adds all the relations that will be scanned by 'node' and its child
+ * plans to context->lockrels after taking into the account the effect
+ * of performing initial pruning if any
+ *
+ * context->stmt gives the PlannedStmt being inspected to access the plan's
+ * range table if needed and context->params the set of EXTERN parameters
+ * available to evaluate pruning parameters.
+ *
+ * If initial pruning is done, a PlanInitPruningOutput node containing the
+ * result of pruning will be stored in context->initPruningOutputs that will
+ * be made available to the executor to reuse.
+ * ------------------------------------------------------------------------
+ */
+bool
+ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context)
+{
+ /* Do nothing when we get to the end of a leaf on tree. */
+ if (node == NULL)
+ return true;
+
+ /* Make sure there's enough stack available. */
+ check_stack_depth();
+
+ switch (nodeTag(node))
+ {
+ /* Currently, only these two nodes have prunable child subplans. */
+ case T_Append:
+ if (ExecGetAppendLockRels((Append *) node, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (ExecGetMergeAppendLockRels((MergeAppend *) node,
+ context))
+ return true;
+ break;
+
+ /*
+ * And these manipulate relations that must be added context->lockrels.
+ */
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapIndexScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ case T_ForeignScan:
+ case T_SubqueryScan:
+ case T_CustomScan:
+ if (ExecGetScanLockRels((Scan *) node, context))
+ return true;
+ break;
+ case T_ModifyTable:
+ if (ExecGetModifyTableLockRels((ModifyTable *) node, context))
+ return true;
+ /* plan_tree_walker() will visit the subplan (outerNode) */
+ break;
+
+ default:
+ break;
+ }
+
+ /* Recurse to subnodes. */
+ return plan_tree_walker(node, ExecGetLockRels, (void *) context);
+}
+
+/*
+ * ExecGetScanLockRels
+ * Do ExecGetLockRels()'s work for a leaf Scan node
+ */
+static bool
+ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context)
+{
+ switch (nodeTag(scan))
+ {
+ case T_ForeignScan:
+ {
+ ForeignScan *fscan = (ForeignScan *) scan;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ fscan->fs_relids);
+ }
+ break;
+
+ case T_SubqueryScan:
+ {
+ SubqueryScan *sscan = (SubqueryScan *) scan;
+
+ (void) ExecGetLockRels((Plan *) sscan->subplan, context);
+ }
+ break;
+
+ case T_CustomScan:
+ {
+ CustomScan *cscan = (CustomScan *) scan;
+ ListCell *lc;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ cscan->custom_relids);
+ foreach(lc, cscan->custom_plans)
+ {
+ (void) ExecGetLockRels((Plan *) lfirst(lc), context);
+ }
+ }
+ break;
+
+ default:
+ context->lockrels = bms_add_member(context->lockrels,
+ scan->scanrelid);
+ break;
+ }
+
+ return true;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -805,6 +1005,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ ExecLockRelsInfo *execlockrelsinfo = queryDesc->execlockrelsinfo;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -824,6 +1025,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_execlockrelsinfo = execlockrelsinfo;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 5dd8ab7db2..02f2c27fdf 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_EXECLOCKRELSINFO UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
@@ -596,12 +598,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *execlockrelsinfo_data;
+ char *execlockrelsinfo_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int execlockrelsinfo_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -630,6 +635,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ execlockrelsinfo_data = nodeToString(estate->es_execlockrelsinfo);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -656,6 +662,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized ExecLockRelsInfo. */
+ execlockrelsinfo_len = strlen(execlockrelsinfo_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, execlockrelsinfo_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -750,6 +761,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized ExecLockRelsInfo */
+ execlockrelsinfo_space = shm_toc_allocate(pcxt->toc, execlockrelsinfo_len);
+ memcpy(execlockrelsinfo_space, execlockrelsinfo_data, execlockrelsinfo_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ execlockrelsinfo_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1231,8 +1248,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *execlockrelsinfospace;
char *paramspace;
PlannedStmt *pstmt;
+ ExecLockRelsInfo *execlockrelsinfo;
ParamListInfo paramLI;
char *queryString;
@@ -1243,12 +1262,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied ExecLockRelsInfo. */
+ execlockrelsinfospace = shm_toc_lookup(toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ false);
+ execlockrelsinfo = (ExecLockRelsInfo *) stringToNode(execlockrelsinfospace);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, execlockrelsinfo,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7ff5a95f05..fddc97280e 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -183,8 +184,13 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
-static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1483,8 +1489,9 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or even before during ExecutorGetLockRels().
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1496,10 +1503,17 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* Creates the PartitionPruneState required by each of the two pruning
* functions. Details stored include how to map the partition index
* returned by the partition pruning code into subplan indexes. Also
- * determines the set of initially valid subplans by performing initial
- * pruning steps, only which need be initialized by the caller such as
- * ExecInitAppend. Maps in PartitionPruneState are updated to account
- * for initial pruning having eliminated some of the subplans, if any.
+ * determines the set of initially valid subplans by either looking that
+ * up in the plan node's PlanInitPruningOutput if one found in
+ * EState.es_execlockrelinfo or by performing initial pruning steps.
+ * Only the subplans included in that need be initialized by the caller
+ * such as ExecInitAppend. Maps in PartitionPruneState are updated to
+ * account for initial pruning having eliminated some of the subplans,
+ * if any.
+ *
+ * ExecGetLockRelsDoInitialPruning:
+ * Do initial pruning as part of ExecGetLockRels() on the parent plan
+ * node
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
@@ -1514,9 +1528,10 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* ExecInitPartitionPruning
* Initialize data structure needed for run-time partition pruning
*
- * Initial pruning can be done immediately, so it is done here if needed and
- * the set of surviving partition subplans' indexes are added to the output
- * parameter *initially_valid_subplans.
+ * Initial pruning can be done immediately, so it is done here unless it has
+ * already been done by ExecGetLockRelsDoInitialPruning(), and the set of
+ * surviving partition subplans' indexes are added to the output parameter
+ * *initially_valid_subplans.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1530,22 +1545,57 @@ ExecInitPartitionPruning(PlanState *planstate,
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ Plan *plan = planstate->plan;
+ PlanInitPruningOutput *initPruningOutput = NULL;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /* Retrieve the parent plan's PlanInitPruningOutput, if any. */
+ if (estate->es_execlockrelsinfo)
+ {
+ initPruningOutput = (PlanInitPruningOutput *)
+ ExecFetchPlanInitPruningOutput(estate->es_execlockrelsinfo, plan);
- /*
- * Create the working data structure for pruning.
- */
- prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+ Assert(initPruningOutput != NULL &&
+ IsA(initPruningOutput, PlanInitPruningOutput));
+ /* No need to do initial pruning again, only exec pruning. */
+ do_pruning = pruneinfo->needs_exec_pruning;
+ }
+
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PlanInitPruningOutput.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo,
+ initPruningOutput == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune, if required.
*/
- if (prunestate->do_initial_prune)
+ if (initPruningOutput)
+ {
+ /* ExecGetLockRelsDoInitialPruning() already did it for us! */
+ *initially_valid_subplans = initPruningOutput->initially_valid_subplans;
+ }
+ else if (prunestate && prunestate->do_initial_prune)
{
/* Determine which subplans survive initial pruning */
- *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate,
+ pruneinfo);
}
else
{
@@ -1563,7 +1613,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* invalid data in prunestate, because that data won't be consulted again
* (cf initial Assert in ExecFindMatchingSubPlans).
*/
- if (prunestate->do_exec_prune &&
+ if (prunestate && prunestate->do_exec_prune &&
bms_num_members(*initially_valid_subplans) < n_total_subplans)
PartitionPruneStateFixSubPlanMap(prunestate,
*initially_valid_subplans,
@@ -1572,12 +1622,75 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecGetLockRelsDoInitialPruning
+ * Perform initial pruning as part of doing ExecGetLockRels() on the parent
+ * plan node
+ */
+Bitmapset *
+ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo)
+{
+ List *rtable = context->stmt->rtable;
+ ParamListInfo params = context->params;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ PlanInitPruningOutput *initPruningOutput;
+
+ /*
+ * A temporary context to allocate stuff needded to run the pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so must create
+ * a standalone ExprContext to evaluate pruning expressions, equipped with
+ * the information about the EXTERN parameters that the caller passed us.
+ * Note that that's okay because the initial pruning steps do not contain
+ * anything that requires the execution to have started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = ExecCreatePartitionPruneState(NULL, pruneinfo,
+ true, false,
+ rtable, econtext,
+ pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the pruning and populate a PlanInitPruningOutput for this node. */
+ initPruningOutput = makeNode(PlanInitPruningOutput);
+ initPruningOutput->initially_valid_subplans =
+ ExecFindInitialMatchingSubPlans(prunestate, pruneinfo);
+ ExecStorePlanInitPruningOutput(context, initPruningOutput, plan);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return initPruningOutput->initially_valid_subplans;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
* ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'partitionpruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1592,19 +1705,20 @@ ExecInitPartitionPruning(PlanState *planstate,
*/
static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo)
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1655,19 +1769,48 @@ ExecCreatePartitionPruneState(PlanState *planstate,
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
+ bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorGetLockRels() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ close_partrel = true;
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (close_partrel)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1769,7 +1912,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1779,7 +1922,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -1893,7 +2036,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
* is required.
*/
static Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1903,8 +2047,8 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
Assert(prunestate->do_initial_prune);
/*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
+ * Switch to a temp context to avoid leaking memory in the longer-term
+ * memory context.
*/
oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..7246f9175f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_execlockrelsinfo = NULL;
estate->es_junkFilter = NULL;
@@ -785,6 +786,13 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rti > 0 && rti <= estate->es_range_table_size);
+ /*
+ * A cross-check that AcquireExecutorLocks() hasn't missed any relations
+ * it must not have.
+ */
+ Assert(estate->es_execlockrelsinfo == NULL ||
+ bms_is_member(rti, estate->es_execlockrelsinfo->lockrels));
+
rel = estate->es_relations[rti - 1];
if (rel == NULL)
{
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 5b6d3eb23b..9c6f907687 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -94,6 +94,55 @@ static bool ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result);
static void ExecAppendAsyncEventWait(AppendState *node);
static void classify_matching_subplans(AppendState *node);
+/* ----------------------------------------------------------------
+ * ExecGetAppendLockRels
+ * Do ExecGetLockRels()'s work for an Append plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ /*
+ * Must always lock all the partitioned tables whose direct and indirect
+ * partitions will be scanned by this Append.
+ */
+ context->lockrels = bms_add_members(context->lockrels,
+ node->partitioned_rels);
+
+ /*
+ * Now recurse to subplans to add relations scanned therein.
+ *
+ * If initial pruning can be done, do that now and only recurse to the
+ * surviving subplans.
+ */
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->appendplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Recurse to surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /* Tell the caller to recurse to *all* the subplans. */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitAppend
*
@@ -155,7 +204,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 9a9f29e845..4b04fcdbc2 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -54,6 +54,55 @@ typedef int32 SlotNumber;
static TupleTableSlot *ExecMergeAppend(PlanState *pstate);
static int heap_compare_slots(Datum a, Datum b, void *arg);
+/* ----------------------------------------------------------------
+ * ExecGetMergeAppendLockRels
+ * Do ExecGetLockRels()'s work for a MergeAppend plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ /*
+ * Must always lock all the partitioned tables whose direct and indirect
+ * partitions will be scanned by this Append.
+ */
+ context->lockrels = bms_add_members(context->lockrels,
+ node->partitioned_rels);
+
+ /*
+ * Now recurse to subplans to add relations scanned therein.
+ *
+ * If initial pruning can be done, do that now and only recurse to the
+ * surviving subplans.
+ */
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->mergeplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Recurse to surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /* Tell the caller to recurse to *all* the subplans. */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitMergeAppend
@@ -103,7 +152,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 701fe05296..23df3efef0 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3008,6 +3008,31 @@ ExecLookupResultRelByOid(ModifyTableState *node, Oid resultoid,
return NULL;
}
+/*
+ * ExecGetModifyTableLockRels
+ * Do ExecGetLockRels()'s work for a ModifyTable plan
+ */
+bool
+ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context)
+{
+ ListCell *lc;
+
+ /* First add the result relation RTIs mentioned in the node. */
+ if (plan->rootRelation > 0)
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->rootRelation);
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->nominalRelation);
+ foreach(lc, plan->resultRelations)
+ {
+ context->lockrels = bms_add_member(context->lockrels,
+ lfirst_int(lc));
+ }
+
+ /* Tell the caller to recurse to the subplan (outerPlan(plan)). */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitModifyTable
* ----------------------------------------------------------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index a82e986667..2107009591 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *execlockrelsinfo_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1659,6 +1660,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
if (!plan->saved)
{
@@ -1670,6 +1672,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
+ execlockrelsinfo_list = copyObject(execlockrelsinfo_list);
MemoryContextSwitchTo(oldcontext);
ReleaseCachedPlan(cplan, NULL);
cplan = NULL; /* portal shouldn't depend on cplan */
@@ -1683,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ execlockrelsinfo_list,
cplan);
/*
@@ -2473,7 +2477,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2552,6 +2558,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2589,9 +2596,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2671,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, execlockrelsinfo,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index dc68a12486..1b94d7c881 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -68,6 +68,13 @@
} \
} while (0)
+/* Copy a field that is an array with numElem ints */
+#define COPY_INT_ARRAY(fldname, numElem) \
+ do { \
+ newnode->fldname = (numElem) > 0 ? palloc((numElem) * sizeof(int)) : NULL; \
+ memcpy(newnode->fldname, from->fldname, sizeof(int) * (numElem)); \
+ } while (0)
+
/* Copy a parse location field (for Copy, this is same as scalar case) */
#define COPY_LOCATION_FIELD(fldname) \
(newnode->fldname = from->fldname)
@@ -94,8 +101,10 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(transientPlan);
COPY_SCALAR_FIELD(dependsOnRole);
COPY_SCALAR_FIELD(parallelModeNeeded);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_SCALAR_FIELD(numPlanNodes);
COPY_NODE_FIELD(rtable);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
@@ -1281,6 +1290,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -4944,6 +4955,33 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static ExecLockRelsInfo *
+_copyExecLockRelsInfo(const ExecLockRelsInfo *from)
+{
+ ExecLockRelsInfo *newnode = makeNode(ExecLockRelsInfo);
+
+ COPY_BITMAPSET_FIELD(lockrels);
+ COPY_SCALAR_FIELD(numPlanNodes);
+ COPY_NODE_FIELD(initPruningOutputs);
+ COPY_INT_ARRAY(ipoIndexes, from->numPlanNodes);
+
+ return newnode;
+}
+
+static PlanInitPruningOutput *
+_copyPlanInitPruningOutput(const PlanInitPruningOutput *from)
+{
+ PlanInitPruningOutput *newnode = makeNode(PlanInitPruningOutput);
+
+ COPY_BITMAPSET_FIELD(initially_valid_subplans);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -4998,7 +5036,6 @@ _copyBitString(const BitString *from)
return newnode;
}
-
static ForeignKeyCacheInfo *
_copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
{
@@ -5947,6 +5984,16 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ retval = _copyExecLockRelsInfo(from);
+ break;
+ case T_PlanInitPruningOutput:
+ retval = _copyPlanInitPruningOutput(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index bc178d53bf..6c404c8664 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -312,8 +312,10 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(transientPlan);
WRITE_BOOL_FIELD(dependsOnRole);
WRITE_BOOL_FIELD(parallelModeNeeded);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_INT_FIELD(numPlanNodes);
WRITE_NODE_FIELD(rtable);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
@@ -1007,6 +1009,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -2702,6 +2706,31 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outExecLockRelsInfo(StringInfo str, const ExecLockRelsInfo *node)
+{
+ WRITE_NODE_TYPE("EXECLOCKRELSINFO");
+
+ WRITE_BITMAPSET_FIELD(lockrels);
+ WRITE_INT_FIELD(numPlanNodes);
+ WRITE_NODE_FIELD(initPruningOutputs);
+ WRITE_INT_ARRAY(ipoIndexes, node->numPlanNodes);
+}
+
+static void
+_outPlanInitPruningOutput(StringInfo str, const PlanInitPruningOutput *node)
+{
+ WRITE_NODE_TYPE("PARTITIONINITPRUNINGOUTPUT");
+
+ WRITE_BITMAPSET_FIELD(initially_valid_subplans);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4543,6 +4572,16 @@ outNode(StringInfo str, const void *obj)
_outPartitionRangeDatum(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ _outExecLockRelsInfo(str, obj);
+ break;
+ case T_PlanInitPruningOutput:
+ _outPlanInitPruningOutput(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 3c673c42d5..863f082729 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1585,8 +1585,10 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(transientPlan);
READ_BOOL_FIELD(dependsOnRole);
READ_BOOL_FIELD(parallelModeNeeded);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_INT_FIELD(numPlanNodes);
READ_NODE_FIELD(rtable);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
@@ -2537,6 +2539,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2706,6 +2710,35 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+/*
+ * _readExecLockRelsInfo
+ */
+static ExecLockRelsInfo *
+_readExecLockRelsInfo(void)
+{
+ READ_LOCALS(ExecLockRelsInfo);
+
+ READ_BITMAPSET_FIELD(lockrels);
+ READ_INT_FIELD(numPlanNodes);
+ READ_NODE_FIELD(initPruningOutputs);
+ READ_INT_ARRAY(ipoIndexes, local_node->numPlanNodes);
+
+ READ_DONE();
+}
+
+/*
+ * _readPlanInitPruningOutput
+ */
+static PlanInitPruningOutput *
+_readPlanInitPruningOutput(void)
+{
+ READ_LOCALS(PlanInitPruningOutput);
+
+ READ_BITMAPSET_FIELD(initially_valid_subplans);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -2977,6 +3010,10 @@ parseNodeString(void)
return_value = _readPartitionBoundSpec();
else if (MATCH("PARTITIONRANGEDATUM", 19))
return_value = _readPartitionRangeDatum();
+ else if (MATCH("EXECLOCKRELSINFO", 16))
+ return_value = _readExecLockRelsInfo();
+ else if (MATCH("PARTITIONINITPRUNINGOUTPUT", 26))
+ return_value = _readPlanInitPruningOutput();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 374a9d9753..329fb9d6e7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -517,7 +517,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->transientPlan = glob->transientPlan;
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->planTree = top_plan;
+ result->numPlanNodes = glob->lastPlanNodeId;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dbdeb8ec9d..ac795ae9d9 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1561,6 +1561,9 @@ set_append_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (aplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
@@ -1648,6 +1651,9 @@ set_mergeappend_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (mplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7080cb25d9..3322dc79f2 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +232,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +313,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!needs_init_pruning)
+ needs_init_pruning = partrel_needs_init_pruning;
+ if (!needs_exec_pruning)
+ needs_exec_pruning = partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -337,6 +349,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +449,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +471,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +562,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +639,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*needs_init_pruning)
+ *needs_init_pruning = (initial_pruning_steps != NIL);
+ if (!*needs_exec_pruning)
+ *needs_exec_pruning = (exec_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ba2fcfeb4a..085eb3f209 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -945,15 +945,17 @@ pg_plan_query(Query *querytree, const char *query_string, int cursorOptions,
* For normal optimizable statements, invoke the planner. For utility
* statements, just make a wrapper PlannedStmt node.
*
- * The result is a list of PlannedStmt nodes.
+ * The result is a list of PlannedStmt nodes. Also, a NULL is appended to
+ * *execlockrelsinfo_list for each PlannedStmt added to the returned list.
*/
List *
pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
- ParamListInfo boundParams)
+ ParamListInfo boundParams, List **execlockrelsinfo_list)
{
List *stmt_list = NIL;
ListCell *query_list;
+ *execlockrelsinfo_list = NIL;
foreach(query_list, querytrees)
{
Query *query = lfirst_node(Query, query_list);
@@ -977,6 +979,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
}
stmt_list = lappend(stmt_list, stmt);
+ *execlockrelsinfo_list = lappend(*execlockrelsinfo_list, NULL);
}
return stmt_list;
@@ -1080,7 +1083,8 @@ exec_simple_query(const char *query_string)
QueryCompletion qc;
MemoryContext per_parsetree_context = NULL;
List *querytree_list,
- *plantree_list;
+ *plantree_list,
+ *plantree_execlockrelsinfo_list;
Portal portal;
DestReceiver *receiver;
int16 format;
@@ -1167,7 +1171,8 @@ exec_simple_query(const char *query_string)
NULL, 0, NULL);
plantree_list = pg_plan_queries(querytree_list, query_string,
- CURSOR_OPT_PARALLEL_OK, NULL);
+ CURSOR_OPT_PARALLEL_OK, NULL,
+ &plantree_execlockrelsinfo_list);
/*
* Done with the snapshot used for parsing/planning.
@@ -1203,6 +1208,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ plantree_execlockrelsinfo_list,
NULL);
/*
@@ -1991,6 +1997,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ cplan->execlockrelsinfo_list,
cplan);
/* Done with the snapshot used for parameter I/O and parsing/planning */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f907831a3..972ddc014e 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->execlockrelsinfo = execlockrelsinfo; /* ExecutorGetLockRels() output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +124,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * execlockrelsinfo: ExecutorGetLockRels() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +137,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +149,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, execlockrelsinfo, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -490,6 +494,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ linitial_node(ExecLockRelsInfo, portal->execlockrelsinfos),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1190,7 +1195,8 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *stmtlist_item,
+ *execlockrelsinfolist_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1211,9 +1217,12 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ forboth(stmtlist_item, portal->stmts,
+ execlockrelsinfolist_item, portal->execlockrelsinfos)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo,
+ execlockrelsinfolist_item);
/*
* If we got a cancel signal in prior command, quit
@@ -1271,7 +1280,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1280,7 +1289,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4cf6db504f..9f5a40a0a6 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,16 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
+static void CachedPlanSaveExecLockRelsInfos(CachedPlan *plan, List *execlockrelsinfo_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static List *AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams);
+static void ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,9 +792,21 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * If the CachedPlan is valid, this may in some cases call ExecutorGetLockRels
+ * on each PlannedStmt contained in it to determine the set of relations to be
+ * locked by AcquireExecutorLocks(), instead of just scanning its range table,
+ * which is done to prune away any nodes in the tree that need not be executed
+ * based on the result of initial partition pruning. Resulting
+ * ExecLockRelsInfo nodes containing the result of such pruning, allocated in
+ * a child context of the context containing the plan itself, are added into
+ * plan->execlockrelsinfo_list. The previous contents of the list from the
+ * last invocation on the same CachedPlan are deleted, because they would no
+ * longer be valid given the fresh set of parameter values which may be used
+ * as pruning parameters.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -820,13 +834,25 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *execlockrelsinfo_list;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. If ExecutorGetLockRels() asked
+ * to omit some relations because the plan nodes that scan them were
+ * found to be pruned, the executor will be informed of the omission of
+ * the plan nodes themselves, so that it doesn't accidentally try to
+ * execute those nodes, via the ExecLockRelsInfo nodes collected in the
+ * returned list that is also passed to it along with the list of
+ * PlannedStmts.
+ */
+ execlockrelsinfo_list = AcquireExecutorLocks(plan->stmt_list,
+ boundParams);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -844,11 +870,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (plan->is_valid)
{
/* Successfully revalidated and locked the query. */
+
+ /* Remember ExecLockRelsInfos in the CachedPlan. */
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
return true;
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, execlockrelsinfo_list);
}
/*
@@ -880,7 +909,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv)
{
CachedPlan *plan;
- List *plist;
+ List *plist,
+ *execlockrelsinfo_list;
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
@@ -933,7 +963,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* Generate the plan.
*/
plist = pg_plan_queries(qlist, plansource->query_string,
- plansource->cursor_options, boundParams);
+ plansource->cursor_options, boundParams,
+ &execlockrelsinfo_list);
/* Release snapshot if we got one */
if (snapshot_set)
@@ -1002,6 +1033,16 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_saved = false;
plan->is_valid = true;
+ /*
+ * Save the dummy ExecLockRelsInfo list, that is a list containing NULLs
+ * as elements. We must do this, becasue users of the CachedPlan expect
+ * one to go with the list of PlannedStmts.
+ * XXX maybe get rid of that contract.
+ */
+ plan->execlockrelsinfo_context = NULL;
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
+ Assert(MemoryContextIsValid(plan->execlockrelsinfo_context));
+
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1160,7 +1201,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1586,6 +1627,49 @@ CopyCachedPlan(CachedPlanSource *plansource)
return newsource;
}
+/*
+ * CachedPlanSaveExecLockRelsInfos
+ * Save the list containing ExecLockRelsInfo nodes into the given
+ * CachedPlan
+ *
+ * The provided list is copied into a dedicated context that is a child of
+ * plan->context. If the child context already exists, it is emptied, because
+ * any ExecLockRelsInfo contained therein would no longer be useful.
+ */
+static void
+CachedPlanSaveExecLockRelsInfos(CachedPlan *plan, List *execlockrelsinfo_list)
+{
+ MemoryContext execlockrelsinfo_context = plan->execlockrelsinfo_context,
+ oldcontext = CurrentMemoryContext;
+ List *execlockrelsinfo_list_copy;
+
+ /*
+ * Set up the dedicated context if not already done, saving it as a child
+ * of the CachedPlan's context.
+ */
+ if (execlockrelsinfo_context == NULL)
+ {
+ execlockrelsinfo_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan execlockrelsinfo list",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextSetParent(execlockrelsinfo_context, plan->context);
+ MemoryContextSetIdentifier(execlockrelsinfo_context, plan->context->ident);
+ plan->execlockrelsinfo_context = execlockrelsinfo_context;
+ }
+ else
+ {
+ /* Just clear existing contents by resetting the context. */
+ Assert(MemoryContextIsValid(execlockrelsinfo_context));
+ MemoryContextReset(execlockrelsinfo_context);
+ }
+
+ MemoryContextSwitchTo(execlockrelsinfo_context);
+ execlockrelsinfo_list_copy = copyObject(execlockrelsinfo_list);
+ MemoryContextSwitchTo(oldcontext);
+
+ plan->execlockrelsinfo_list = execlockrelsinfo_list_copy;
+}
+
/*
* CachedPlanIsValid: test whether the rewritten querytree within a
* CachedPlanSource is currently valid (that is, not marked as being in need
@@ -1737,17 +1821,21 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * Returns a list of ExecLockRelsInfo nodes containing one element for each
+ * PlannedStmt in stmt_list or NULL if the latter is utility statement or its
+ * containsInitialPruning is false.
*/
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+static List *
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams)
{
ListCell *lc1;
+ List *execlockrelsinfo_list = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ ExecLockRelsInfo *execlockrelsinfo = NULL;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,27 +1849,139 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
- continue;
+ ScanQueryForLocks(query, true);
}
-
- foreach(lc2, plannedstmt->rtable)
+ else
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (!plannedstmt->containsInitialPruning)
+ {
+ /*
+ * If the plan contains no initial pruning steps, just lock
+ * all the relations found in the range table.
+ */
+ ListCell *lc;
- if (rte->rtekind != RTE_RELATION)
- continue;
+ foreach(lc, plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst(lc);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ /*
+ * Acquire the appropriate type of lock on each relation
+ * OID. Note that we don't actually try to open the rel,
+ * and hence will not fail if it's been dropped entirely
+ * --- we'll just transiently acquire a non-conflicting
+ * lock.
+ */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ else
+ {
+ int rti;
+ Bitmapset *lockrels;
+
+ /*
+ * Walk the plan tree to find only the minimal set of
+ * relations to be locked, considering the effect of performing
+ * initial partition pruning.
+ */
+ execlockrelsinfo = ExecutorGetLockRels(plannedstmt, boundParams);
+ lockrels = execlockrelsinfo->lockrels;
+
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment above. */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ }
+
+ /*
+ * Remember ExecLockRelsInfo for later adding to the QueryDesc that
+ * will be passed to the executor when executing this plan. May be
+ * NULL, but must keep the list the same length as stmt_list.
+ */
+ execlockrelsinfo_list = lappend(execlockrelsinfo_list,
+ execlockrelsinfo);
+ }
+
+ return execlockrelsinfo_list;
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, execlockrelsinfo_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc2);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
/*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ }
+ else
+ {
+ if (execlockrelsinfo == NULL)
+ {
+ ListCell *lc;
+
+ foreach(lc, plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst(lc);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ {
+ int rti;
+ Bitmapset *lockrels;
+
+ lockrels = execlockrelsinfo->lockrels;
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..896f51be08 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -285,6 +285,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan)
{
AssertArg(PortalIsValid(portal));
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->execlockrelsinfos = execlockrelsinfos;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..fef75ba147 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index fd5735a946..ded19b8cbb 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,4 +124,6 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
+extern Bitmapset *ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..4338463479 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecLockRelsInfo *execlockrelsinfo; /* ExecutorGetLockRels()'s output given plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 82925b4b63..5cf414cc11 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern ExecLockRelsInfo *ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params);
+extern bool ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4cb78ee5b6..b53535c2a4 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern bool ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context);
extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
extern void ExecEndAppend(AppendState *node);
extern void ExecReScanAppend(AppendState *node);
diff --git a/src/include/executor/nodeMergeAppend.h b/src/include/executor/nodeMergeAppend.h
index 97fe3b0665..8eb4e9df93 100644
--- a/src/include/executor/nodeMergeAppend.h
+++ b/src/include/executor/nodeMergeAppend.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern bool ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context);
extern MergeAppendState *ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags);
extern void ExecEndMergeAppend(MergeAppendState *node);
extern void ExecReScanMergeAppend(MergeAppendState *node);
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 1d225bc88d..5006499088 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -19,6 +19,7 @@ extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
EState *estate, TupleTableSlot *slot,
CmdType cmdtype);
+extern bool ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context);
extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
extern void ExecEndModifyTable(ModifyTableState *node);
extern void ExecReScanModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 44dd73fc80..1253fdb0ed 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -576,6 +576,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct ExecLockRelsInfo *es_execlockrelsinfo; /* QueryDesc.execlockrelsinfo */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -964,6 +965,101 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * ExecLockRelsInfo
+ *
+ * Result of performing ExecutorGetLockRels() for a given PlannedStmt
+ */
+typedef struct ExecLockRelsInfo
+{
+ NodeTag type;
+
+ /*
+ * Relations that must be locked to execute the plan tree contained in
+ * the PlannedStmt.
+ */
+ Bitmapset *lockrels;
+
+ /* PlannedStmt.numPlanNodes */
+ int numPlanNodes;
+
+ /*
+ * List of PlanInitPruningOutput, each representing the output of
+ * performing initial pruning on a given plan node, for all nodes in the
+ * plan tree that have been marked as needing initial pruning.
+ *
+ * 'ipoIndexes' is an array of 'numPlanNodes' elements, indexed with
+ * plan_node_id of the individual nodes in the plan tree, each a 1-based
+ * index into 'initPruningOutputs' list for a given plan node. 0 means
+ * that a given plan node has no entry in the list because of not needing
+ * any initial pruning done on it.
+ */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecLockRelsInfo;
+
+/*----------------
+ * ExecGetLockRelsContext
+ *
+ * Information pertaining to ExecutorGetLockRels() invocation for a given
+ * plan.
+ */
+typedef struct ExecGetLockRelsContext
+{
+ NodeTag type;
+
+ PlannedStmt *stmt; /* target plan */
+ ParamListInfo params; /* EXTERN parameters available for pruning */
+
+ /* Output parameters for ExecGetLockRels and its subroutines. */
+ Bitmapset *lockrels;
+
+ /* See the omment in the definition of ExecLockRelsInfo struct. */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecGetLockRelsContext;
+
+/*
+ * Appends the provided PlanInitPruningOutput to
+ * ExecGetLockRelsContext.initPruningOutput
+ */
+#define ExecStorePlanInitPruningOutput(cxt, initPruningOutput, plannode) \
+ do { \
+ (cxt)->initPruningOutputs = lappend((cxt)->initPruningOutputs, initPruningOutput); \
+ (cxt)->ipoIndexes[(plannode)->plan_node_id] = list_length((cxt)->initPruningOutputs); \
+ } while (0)
+
+/*
+ * Finds the PlanInitPruningOutput for a given Plan node in
+ * ExecLockRelsInfo.initPruningOutputs.
+ */
+#define ExecFetchPlanInitPruningOutput(execlockrelsinfo, plannode) \
+ (((execlockrelsinfo) != NULL && (execlockrelsinfo)->initPruningOutputs != NIL) ? \
+ list_nth((execlockrelsinfo)->initPruningOutputs, \
+ (execlockrelsinfo)->ipoIndexes[(plannode)->plan_node_id] - 1) : NULL)
+
+/* ---------------
+ * PlanInitPruningOutput
+ *
+ * Node to remember the result of performing initial partition pruning steps
+ * during ExecutorGetLockRels() on nodes that support pruning.
+ *
+ * ExecLockRelsDoInitPruning(), which runs during ExecutorGetLockRels(),
+ * creates it and stores it in the corresponding ExecLockRelsInfo.
+ *
+ * ExecInitPartitionPruning(), which runs during ExecuorStart(), fetches it
+ * from the EState's ExecLockRelsInfo (if any) and uses the value of
+ * initially_valid_subplans contained in it as-is to select the subplans to be
+ * initialized for execution, instead of re-evaluating that by performing
+ * initial pruning again.
+ */
+typedef struct PlanInitPruningOutput
+{
+ NodeTag type;
+
+ Bitmapset *initially_valid_subplans;
+} PlanInitPruningOutput;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 5d075f0c34..d365fc4402 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -96,6 +96,11 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_ExecGetLockRelsContext,
+ T_ExecLockRelsInfo,
+ T_PlanInitPruningOutput,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 5327d9ba8b..019719c1a4 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -129,6 +129,10 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
PartitionDirectory partition_directory; /* partition descriptors */
Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index bd87c35d6c..bfdb5bbf28 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -59,10 +59,16 @@ typedef struct PlannedStmt
bool parallelModeNeeded; /* parallel mode required to execute? */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
int jitFlags; /* which forms of JIT should be performed */
struct Plan *planTree; /* tree of Plan nodes */
+ int numPlanNodes; /* number of nodes in planTree */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -1189,6 +1195,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1197,6 +1210,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 92291a750d..bf80c53bed 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -64,7 +64,7 @@ extern PlannedStmt *pg_plan_query(Query *querytree, const char *query_string,
ParamListInfo boundParams);
extern List *pg_plan_queries(List *querytrees, const char *query_string,
int cursorOptions,
- ParamListInfo boundParams);
+ ParamListInfo boundParams, List **execlockrelsinfo_list);
extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
extern void assign_max_stack_depth(int newval, void *extra);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 95b99e3d25..56b0dcc6bd 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -148,6 +148,9 @@ typedef struct CachedPlan
{
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
+ List *execlockrelsinfo_list; /* list of ExecutorGetLockRelsResult with one
+ * element for each of stmt_list; NIL
+ * if not a generic plan */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
@@ -158,6 +161,9 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext execlockrelsinfo_context; /* context containing
+ * execlockrelsinfo_list,
+ * a child of the above context */
} CachedPlan;
/*
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9abace6734 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,10 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *execlockrelsinfos; /* list of ExecutorGetLockRelsResults with one element
+ * for each of 'stmts'; same as
+ * cplan->execlockrelsinfo_list if cplan is
+ * not NULL */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -241,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.24.1
[application/x-patch] v6-0001-Some-refactoring-of-runtime-pruning-code.patch (26.5K, 5-v6-0001-Some-refactoring-of-runtime-pruning-code.patch)
download | inline diff:
From df8186c0e4a76f31c1f803a953f2c98ac88f9dc8 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 2 Mar 2022 15:17:55 +0900
Subject: [PATCH v6 1/4] Some refactoring of runtime pruning code
This does two things mainly:
* Move the execution pruning initialization steps that are common
between both ExecInitAppend() and ExecInitMergeAppend() into a new
function ExecInitPartitionPruning() defined in execPartition.c.
Thus, ExecCreatePartitionPruneState() and
ExecFindInitialMatchingSubPlans() need not be exported.
* Add an ExprContext field to PartitionPruneContext to remove the
implicit assumption in the runtime pruning code that the ExprContext
to use to compute pruning expressions that need one can always rely
on the PlanState providing it. A future patch will allow runtime
pruning (at least the initial pruning steps) to be performed without
the corresponding PlanState yet having been created, so this will
help.
---
src/backend/executor/execPartition.c | 340 ++++++++++++++++---------
src/backend/executor/nodeAppend.c | 33 +--
src/backend/executor/nodeMergeAppend.c | 32 +--
src/backend/partitioning/partprune.c | 20 +-
src/include/executor/execPartition.h | 9 +-
src/include/partitioning/partprune.h | 2 +
6 files changed, 252 insertions(+), 184 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 90ed1485d1..7ff5a95f05 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,11 +182,18 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
bool *isnull,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
+static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *partitionpruneinfo);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate);
+ PlanState *planstate,
+ ExprContext *econtext);
+static void PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1485,30 +1492,86 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
*
* Functions:
*
- * ExecCreatePartitionPruneState:
+ * ExecInitPartitionPruning:
* Creates the PartitionPruneState required by each of the two pruning
* functions. Details stored include how to map the partition index
- * returned by the partition pruning code into subplan indexes.
- *
- * ExecFindInitialMatchingSubPlans:
- * Returns indexes of matching subplans. Partition pruning is attempted
- * without any evaluation of expressions containing PARAM_EXEC Params.
- * This function must be called during executor startup for the parent
- * plan before the subplans themselves are initialized. Subplans which
- * are found not to match by this function must be removed from the
- * plan's list of subplans during execution, as this function performs a
- * remap of the partition index to subplan index map and the newly
- * created map provides indexes only for subplans which remain after
- * calling this function.
+ * returned by the partition pruning code into subplan indexes. Also
+ * determines the set of initially valid subplans by performing initial
+ * pruning steps, only which need be initialized by the caller such as
+ * ExecInitAppend. Maps in PartitionPruneState are updated to account
+ * for initial pruning having eliminated some of the subplans, if any.
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
- * expressions. This function can only be called during execution and
- * must be called again each time the value of a Param listed in
- * PartitionPruneState's 'execparamids' changes.
+ * expressions, that is, using execution pruning steps. This function can
+ * can only be called during execution and must be called again each time
+ * the value of a Param listed in PartitionPruneState's 'execparamids'
+ * changes.
*-------------------------------------------------------------------------
*/
+/*
+ * ExecInitPartitionPruning
+ * Initialize data structure needed for run-time partition pruning
+ *
+ * Initial pruning can be done immediately, so it is done here if needed and
+ * the set of surviving partition subplans' indexes are added to the output
+ * parameter *initially_valid_subplans.
+ *
+ * If subplans are indeed pruned, subplan_map arrays contained in the returned
+ * PartitionPruneState are re-sequenced to not count those, though only if the
+ * maps will be needed for subsequent execution pruning passes.
+ */
+PartitionPruneState *
+ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans)
+{
+ PartitionPruneState *prunestate;
+ EState *estate = planstate->state;
+
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /*
+ * Create the working data structure for pruning.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+
+ /*
+ * Perform an initial partition prune, if required.
+ */
+ if (prunestate->do_initial_prune)
+ {
+ /* Determine which subplans survive initial pruning */
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ }
+ else
+ {
+ /* We'll need to initialize all subplans */
+ Assert(n_total_subplans > 0);
+ *initially_valid_subplans = bms_add_range(NULL, 0,
+ n_total_subplans - 1);
+ }
+
+ /*
+ * Re-sequence subplan indexes contained in prunestate to account for any
+ * that were removed above due to initial pruning.
+ *
+ * We can safely skip this when !do_exec_prune, even though that leaves
+ * invalid data in prunestate, because that data won't be consulted again
+ * (cf initial Assert in ExecFindMatchingSubPlans).
+ */
+ if (prunestate->do_exec_prune &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ PartitionPruneStateFixSubPlanMap(prunestate,
+ *initially_valid_subplans,
+ n_total_subplans);
+
+ return prunestate;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
@@ -1527,7 +1590,7 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* re-used each time we re-evaluate which partitions match the pruning steps
* provided in each PartitionedRelPruneInfo.
*/
-PartitionPruneState *
+static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
PartitionPruneInfo *partitionpruneinfo)
{
@@ -1536,6 +1599,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
int n_part_hierarchies;
ListCell *lc;
int i;
+ ExprContext *econtext = planstate->ps_ExprContext;
/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1709,7 +1773,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
@@ -1718,7 +1783,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
}
@@ -1746,7 +1812,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate)
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1767,6 +1834,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1795,8 +1863,20 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ /*
+ * When planstate is NULL, pruning_steps is known not to
+ * contain any expressions that depend on the parent plan.
+ * Information of any available EXTERN parameters must be
+ * passed explicitly in that case, which the caller must
+ * have made available via econtext.
+ */
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
@@ -1809,18 +1889,11 @@ ExecInitPruningContext(PartitionPruneContext *context,
* pruning, disregarding any pruning constraints involving PARAM_EXEC
* Params.
*
- * If additional pruning passes will be required (because of PARAM_EXEC
- * Params), we must also update the translation data that allows conversion
- * of partition indexes into subplan indexes to account for the unneeded
- * subplans having been removed.
- *
* Must only be called once per 'prunestate', and only if initial pruning
* is required.
- *
- * 'nsubplans' must be passed as the total number of unpruned subplans.
*/
-Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+static Bitmapset *
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1845,14 +1918,20 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
PartitionedRelPruningData *pprune;
prunedata = prunestate->partprunedata[i];
+
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
pprune = &prunedata->partrelprunedata[0];
/* Perform pruning without using PARAM_EXEC Params */
find_matching_subplans_recurse(prunedata, pprune, true, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->initial_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
@@ -1865,118 +1944,120 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
MemoryContextReset(prunestate->prune_context);
+ return result;
+}
+
+/*
+ * PartitionPruneStateFixSubPlanMap
+ * Fix mapping of partition indexes to subplan indexes contained in
+ * prunestate by considering the new list of subplans that survived
+ * initial pruning
+ *
+ * Subplans would previously be indexed 0..(n_total_subplans - 1) should be
+ * changed to index range 0..num(initially_valid_subplans).
+ */
+static void
+PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans)
+{
+ int *new_subplan_indexes;
+ Bitmapset *new_other_subplans;
+ int i;
+ int newidx;
+
/*
- * If exec-time pruning is required and we pruned subplans above, then we
- * must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
- * properly returns the indexes from the subplans which will remain after
- * execution of this function.
- *
- * We can safely skip this when !do_exec_prune, even though that leaves
- * invalid data in prunestate, because that data won't be consulted again
- * (cf initial Assert in ExecFindMatchingSubPlans).
+ * First we must build a temporary array which maps old subplan
+ * indexes to new ones. For convenience of initialization, we use
+ * 1-based indexes in this array and leave pruned items as 0.
*/
- if (prunestate->do_exec_prune && bms_num_members(result) < nsubplans)
+ new_subplan_indexes = (int *) palloc0(sizeof(int) * n_total_subplans);
+ newidx = 1;
+ i = -1;
+ while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
{
- int *new_subplan_indexes;
- Bitmapset *new_other_subplans;
- int i;
- int newidx;
+ Assert(i < n_total_subplans);
+ new_subplan_indexes[i] = newidx++;
+ }
- /*
- * First we must build a temporary array which maps old subplan
- * indexes to new ones. For convenience of initialization, we use
- * 1-based indexes in this array and leave pruned items as 0.
- */
- new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
- newidx = 1;
- i = -1;
- while ((i = bms_next_member(result, i)) >= 0)
- {
- Assert(i < nsubplans);
- new_subplan_indexes[i] = newidx++;
- }
+ /*
+ * Now we can update each PartitionedRelPruneInfo's subplan_map with
+ * new subplan indexes. We must also recompute its present_parts
+ * bitmap.
+ */
+ for (i = 0; i < prunestate->num_partprunedata; i++)
+ {
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
/*
- * Now we can update each PartitionedRelPruneInfo's subplan_map with
- * new subplan indexes. We must also recompute its present_parts
- * bitmap.
+ * Within each hierarchy, we perform this loop in back-to-front
+ * order so that we determine present_parts for the lowest-level
+ * partitioned tables first. This way we can tell whether a
+ * sub-partitioned table's partitions were entirely pruned so we
+ * can exclude it from the current level's present_parts.
*/
- for (i = 0; i < prunestate->num_partprunedata; i++)
+ for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
{
- PartitionPruningData *prunedata = prunestate->partprunedata[i];
- int j;
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ int nparts = pprune->nparts;
+ int k;
- /*
- * Within each hierarchy, we perform this loop in back-to-front
- * order so that we determine present_parts for the lowest-level
- * partitioned tables first. This way we can tell whether a
- * sub-partitioned table's partitions were entirely pruned so we
- * can exclude it from the current level's present_parts.
- */
- for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
- {
- PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
- int nparts = pprune->nparts;
- int k;
+ /* We just rebuild present_parts from scratch */
+ bms_free(pprune->present_parts);
+ pprune->present_parts = NULL;
- /* We just rebuild present_parts from scratch */
- bms_free(pprune->present_parts);
- pprune->present_parts = NULL;
+ for (k = 0; k < nparts; k++)
+ {
+ int oldidx = pprune->subplan_map[k];
+ int subidx;
- for (k = 0; k < nparts; k++)
+ /*
+ * If this partition existed as a subplan then change the
+ * old subplan index to the new subplan index. The new
+ * index may become -1 if the partition was pruned above,
+ * or it may just come earlier in the subplan list due to
+ * some subplans being removed earlier in the list. If
+ * it's a subpartition, add it to present_parts unless
+ * it's entirely pruned.
+ */
+ if (oldidx >= 0)
{
- int oldidx = pprune->subplan_map[k];
- int subidx;
-
- /*
- * If this partition existed as a subplan then change the
- * old subplan index to the new subplan index. The new
- * index may become -1 if the partition was pruned above,
- * or it may just come earlier in the subplan list due to
- * some subplans being removed earlier in the list. If
- * it's a subpartition, add it to present_parts unless
- * it's entirely pruned.
- */
- if (oldidx >= 0)
- {
- Assert(oldidx < nsubplans);
- pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
+ Assert(oldidx < n_total_subplans);
+ pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
- if (new_subplan_indexes[oldidx] > 0)
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
- else if ((subidx = pprune->subpart_map[k]) >= 0)
- {
- PartitionedRelPruningData *subprune;
+ if (new_subplan_indexes[oldidx] > 0)
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ else if ((subidx = pprune->subpart_map[k]) >= 0)
+ {
+ PartitionedRelPruningData *subprune;
- subprune = &prunedata->partrelprunedata[subidx];
+ subprune = &prunedata->partrelprunedata[subidx];
- if (!bms_is_empty(subprune->present_parts))
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
+ if (!bms_is_empty(subprune->present_parts))
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
}
}
}
+ }
- /*
- * We must also recompute the other_subplans set, since indexes in it
- * may change.
- */
- new_other_subplans = NULL;
- i = -1;
- while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
- new_other_subplans = bms_add_member(new_other_subplans,
- new_subplan_indexes[i] - 1);
-
- bms_free(prunestate->other_subplans);
- prunestate->other_subplans = new_other_subplans;
+ /*
+ * We must also recompute the other_subplans set, since indexes in it
+ * may change.
+ */
+ new_other_subplans = NULL;
+ i = -1;
+ while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
+ new_other_subplans = bms_add_member(new_other_subplans,
+ new_subplan_indexes[i] - 1);
- pfree(new_subplan_indexes);
- }
+ bms_free(prunestate->other_subplans);
+ prunestate->other_subplans = new_other_subplans;
- return result;
+ pfree(new_subplan_indexes);
}
/*
@@ -2018,11 +2099,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
find_matching_subplans_recurse(prunedata, pprune, false, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
- ResetExprContext(pprune->exec_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->exec_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 7937f1c88f..5b6d3eb23b 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -138,30 +138,17 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &appendstate->ps);
-
- /* Create the working data structure for pruning. */
- prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&appendstate->ps,
+ list_length(node->appendplans),
+ node->part_prune_info,
+ &validsubplans);
appendstate->as_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->appendplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 418f89dea8..9a9f29e845 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -86,29 +86,17 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &mergestate->ps);
-
- prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&mergestate->ps,
+ list_length(node->mergeplans),
+ node->part_prune_info,
+ &validsubplans);
mergestate->ms_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->mergeplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 1bc00826c1..7080cb25d9 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -798,6 +798,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +809,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3655,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3678,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..fd5735a946 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -119,10 +119,9 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
EState *estate);
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
-extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
+extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
-extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index ee11b6feae..90684efa25 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,7 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use when evaluating pruning expressions
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +57,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
--
2.24.1
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-03-28 07:28 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-03-28 07:28 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; pgsql-hackers; David Rowley *EXTERN* <[email protected]>
On Mon, Mar 28, 2022 at 4:17 PM Amit Langote <[email protected]> wrote:
> Other than the changes mentioned above, the updated patch now contains
> a bit more commentary than earlier versions, mostly around
> AcquireExecutorLocks()'s new way of determining the set of relations
> to lock and the significantly redesigned working of the "initial"
> execution pruning.
Forgot to rebase over the latest HEAD, so here's v7. Also fixed that
_out and _read functions for PlanInitPruningOutput were using an
obsolete node label.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v7-0002-Add-Merge-Append.partitioned_rels.patch (17.4K, 2-v7-0002-Add-Merge-Append.partitioned_rels.patch)
download | inline diff:
From b43aac217ba51854c5a22636f94f14e81bae3991 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Thu, 24 Mar 2022 22:47:03 +0900
Subject: [PATCH v7 2/4] Add [Merge]Append.partitioned_rels
To record the RT indexes of all partitioned ancestors leading up to
leaf partitions that are appended by the node.
If a given [Merge]Append node is left out from the plan due to there
being only one element in its list of child subplans, then its
partitioned_rels set is added to PlannerGlobal.elidedAppendPartedRels
that is passed down to the executor through PlannedStmt.
There are no users for partitioned_rels and elidedAppendPartedRels
as of this commit, though a later commit will require the ability
to extract the set of relations that must be locked to make a plan
tree safe for execution by walking the plan tree itself, so having
the partitioned tables be also present in the plan tree will be
helpful. Note that currently the executor relies on the fact that
the set of relations to be locked can be obtained by simply scanning
the range table that's made available in PlannedStmt along with the
plan tree.
---
src/backend/nodes/copyfuncs.c | 3 +++
src/backend/nodes/outfuncs.c | 5 +++++
src/backend/nodes/readfuncs.c | 3 +++
src/backend/optimizer/path/joinrels.c | 9 ++++++++
src/backend/optimizer/plan/createplan.c | 18 +++++++++++++++-
src/backend/optimizer/plan/planner.c | 8 +++++++
src/backend/optimizer/plan/setrefs.c | 28 +++++++++++++++++++++++++
src/backend/optimizer/util/inherit.c | 16 ++++++++++++++
src/backend/optimizer/util/relnode.c | 20 ++++++++++++++++++
src/include/nodes/pathnodes.h | 22 +++++++++++++++++++
src/include/nodes/plannodes.h | 17 +++++++++++++++
11 files changed, 148 insertions(+), 1 deletion(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 2cbd8aa0df..d4b5cc7e59 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -106,6 +106,7 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_NODE_FIELD(invalItems);
COPY_NODE_FIELD(paramExecTypes);
COPY_NODE_FIELD(utilityStmt);
+ COPY_BITMAPSET_FIELD(elidedAppendPartedRels);
COPY_LOCATION_FIELD(stmt_location);
COPY_SCALAR_FIELD(stmt_len);
@@ -253,6 +254,7 @@ _copyAppend(const Append *from)
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
COPY_NODE_FIELD(part_prune_info);
+ COPY_BITMAPSET_FIELD(partitioned_rels);
return newnode;
}
@@ -281,6 +283,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
COPY_NODE_FIELD(part_prune_info);
+ COPY_BITMAPSET_FIELD(partitioned_rels);
return newnode;
}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index c25f0bd684..99056272f3 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -324,6 +324,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
WRITE_NODE_FIELD(utilityStmt);
+ WRITE_BITMAPSET_FIELD(elidedAppendPartedRels);
WRITE_LOCATION_FIELD(stmt_location);
WRITE_INT_FIELD(stmt_len);
}
@@ -443,6 +444,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
WRITE_NODE_FIELD(part_prune_info);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
@@ -460,6 +462,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
WRITE_NODE_FIELD(part_prune_info);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
@@ -2333,6 +2336,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_BOOL_FIELD(parallelModeOK);
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_CHAR_FIELD(maxParallelHazard);
+ WRITE_BITMAPSET_FIELD(elidedAppendPartedRels);
}
static void
@@ -2444,6 +2448,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_BOOL_FIELD(partbounds_merged);
WRITE_BITMAPSET_FIELD(live_parts);
WRITE_BITMAPSET_FIELD(all_partrels);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index e0b3ad1ed2..7536f216bd 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1662,6 +1662,7 @@ _readPlannedStmt(void)
READ_NODE_FIELD(invalItems);
READ_NODE_FIELD(paramExecTypes);
READ_NODE_FIELD(utilityStmt);
+ READ_BITMAPSET_FIELD(elidedAppendPartedRels);
READ_LOCATION_FIELD(stmt_location);
READ_INT_FIELD(stmt_len);
@@ -1784,6 +1785,7 @@ _readAppend(void)
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
READ_NODE_FIELD(part_prune_info);
+ READ_BITMAPSET_FIELD(partitioned_rels);
READ_DONE();
}
@@ -1806,6 +1808,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
READ_NODE_FIELD(part_prune_info);
+ READ_BITMAPSET_FIELD(partitioned_rels);
READ_DONE();
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 9da3ff2f9a..e74d40fee3 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1549,6 +1549,15 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
child_restrictlist);
+
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * joinrel's set.
+ */
+ joinrel->partitioned_rels =
+ bms_add_members(joinrel->partitioned_rels,
+ child_joinrel->partitioned_rels);
}
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index fa069a217c..0026086591 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -26,10 +26,12 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
#include "optimizer/paramassign.h"
+#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
@@ -1331,11 +1333,11 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
best_path->subpaths,
prunequal);
}
-
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
plan->part_prune_info = partpruneinfo;
+ plan->partitioned_rels = bms_copy(rel->partitioned_rels);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1499,6 +1501,20 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
node->mergeplans = subplans;
node->part_prune_info = partpruneinfo;
+ /*
+ * We need to explicitly add to the plan node the RT indexes of any
+ * partitioned tables whose partitions will be scanned by the nodes in
+ * 'subplans'. There can be multiple RT indexes in the set due to the
+ * partition tree being multi-level and/or this being a plan for UNION ALL
+ * over multiple partition trees. Along with scanrelids of leaf-level Scan
+ * nodes, this allows the executor to lock the full set of relations being
+ * scanned by this node.
+ *
+ * Note that 'apprelids' only contains the top-level base relation(s), so
+ * is not sufficient for the purpose.
+ */
+ node->partitioned_rels = bms_copy(rel->partitioned_rels);
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
* produce either the exact tlist or a narrow tlist, we should get rid of
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index bd09f85aea..374a9d9753 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -529,6 +529,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->paramExecTypes = glob->paramExecTypes;
/* utilityStmt should be null, but we might as well copy it */
result->utilityStmt = parse->utilityStmt;
+ result->elidedAppendPartedRels = glob->elidedAppendPartedRels;
result->stmt_location = parse->stmt_location;
result->stmt_len = parse->stmt_len;
@@ -7365,6 +7366,13 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, grouped_rel, grouped_live_children);
}
+
+ /*
+ * Input rel might be a partitioned appendrel, though grouped_rel has at
+ * this point taken its role as the an appendrel owning the former's
+ * children, so copy the former's partitioned_rels set into the latter.
+ */
+ grouped_rel->partitioned_rels = bms_copy(input_rel->partitioned_rels);
}
/*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index a7b11b7f03..dbdeb8ec9d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1512,6 +1512,10 @@ set_append_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /* Fix up partitioned_rels before possibly removing the Append below. */
+ aplan->partitioned_rels = offset_relid_set(aplan->partitioned_rels,
+ rtoffset);
+
/*
* See if it's safe to get rid of the Append entirely. For this to be
* safe, there must be only one child plan and that child plan's parallel
@@ -1522,8 +1526,17 @@ set_append_references(PlannerInfo *root,
*/
if (list_length(aplan->appendplans) == 1 &&
((Plan *) linitial(aplan->appendplans))->parallel_aware == aplan->plan.parallel_aware)
+ {
+ /*
+ * Partitioned table involved, if any, must be made known to the
+ * executor.
+ */
+ root->glob->elidedAppendPartedRels =
+ bms_add_members(root->glob->elidedAppendPartedRels,
+ aplan->partitioned_rels);
return clean_up_removed_plan_level((Plan *) aplan,
(Plan *) linitial(aplan->appendplans));
+ }
/*
* Otherwise, clean up the Append as needed. It's okay to do this after
@@ -1584,6 +1597,12 @@ set_mergeappend_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /*
+ * Fix up partitioned_rels before possibly removing the MergeAppend below.
+ */
+ mplan->partitioned_rels = offset_relid_set(mplan->partitioned_rels,
+ rtoffset);
+
/*
* See if it's safe to get rid of the MergeAppend entirely. For this to
* be safe, there must be only one child plan and that child plan's
@@ -1594,8 +1613,17 @@ set_mergeappend_references(PlannerInfo *root,
*/
if (list_length(mplan->mergeplans) == 1 &&
((Plan *) linitial(mplan->mergeplans))->parallel_aware == mplan->plan.parallel_aware)
+ {
+ /*
+ * Partitioned tables involved, if any, must be made known to the
+ * executor.
+ */
+ root->glob->elidedAppendPartedRels =
+ bms_add_members(root->glob->elidedAppendPartedRels,
+ mplan->partitioned_rels);
return clean_up_removed_plan_level((Plan *) mplan,
(Plan *) linitial(mplan->mergeplans));
+ }
/*
* Otherwise, clean up the MergeAppend as needed. It's okay to do this
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 7e134822f3..56912e4101 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -406,6 +406,14 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
childrte, childRTindex,
childrel, top_parentrc, lockmode);
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * rel's set.
+ */
+ relinfo->partitioned_rels = bms_add_members(relinfo->partitioned_rels,
+ childrelinfo->partitioned_rels);
+
/* Close child relation, but keep locks */
table_close(childrel, NoLock);
}
@@ -737,6 +745,14 @@ expand_appendrel_subquery(PlannerInfo *root, RelOptInfo *rel,
/* Child may itself be an inherited rel, either table or subquery. */
if (childrte->inh)
expand_inherited_rtentry(root, childrel, childrte, childRTindex);
+
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * rel's set.
+ */
+ rel->partitioned_rels = bms_add_members(rel->partitioned_rels,
+ childrel->partitioned_rels);
}
}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba..1d082a8fdd 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -361,6 +361,10 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
}
}
+ /* A partitioned appendrel. */
+ if (rel->part_scheme != NULL)
+ rel->partitioned_rels = bms_copy(rel->relids);
+
/* Save the finished struct in the query's simple_rel_array */
root->simple_rel_array[relid] = rel;
@@ -729,6 +733,14 @@ build_join_rel(PlannerInfo *root,
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
sjinfo, restrictlist);
+ /*
+ * The joinrel may get processed as an appendrel via partitionwise join
+ * if both outer and inner rels are partitioned, so set partitioned_rels
+ * appropriately.
+ */
+ joinrel->partitioned_rels = bms_union(outer_rel->partitioned_rels,
+ inner_rel->partitioned_rels);
+
/*
* Set the consider_parallel flag if this joinrel could potentially be
* scanned within a parallel worker. If this flag is false for either
@@ -897,6 +909,14 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
sjinfo, restrictlist);
+ /*
+ * The joinrel may get processed as an appendrel via partitionwise join
+ * if both outer and inner rels are partitioned, so set partitioned_rels
+ * appropriately.
+ */
+ joinrel->partitioned_rels = bms_union(outer_rel->partitioned_rels,
+ inner_rel->partitioned_rels);
+
/* We build the join only once. */
Assert(!find_join_rel(root, joinrel->relids));
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1f3845b3fe..5327d9ba8b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -130,6 +130,11 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
PartitionDirectory partition_directory; /* partition descriptors */
+
+ Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
+ * single-subplan [Merge]Append nodes
+ * that have been removed fron the
+ * various plan trees. */
} PlannerGlobal;
/* macro for fetching the Plan associated with a SubPlan node */
@@ -773,6 +778,23 @@ typedef struct RelOptInfo
Relids all_partrels; /* Relids set of all partition relids */
List **partexprs; /* Non-nullable partition key expressions */
List **nullable_partexprs; /* Nullable partition key expressions */
+
+ /*
+ * For an appendrel parent relation (base, join, or upper) that is
+ * partitioned, this stores the RT indexes of all the paritioned ancestors
+ * including itself that lead up to the individual leaf partitions that
+ * will be scanned to produce this relation's output rows. The relid set
+ * is copied into the resulting Append or MergeAppend plan node for
+ * allowing the executor to take appropriate locks on those relations,
+ * unless the node is deemed useless in setrefs.c due to having a single
+ * leaf subplan and thus elided from the final plan, in which case, the set
+ * is added into PlannerGlobal.elidedAppendPartedRels.
+ *
+ * Note that 'apprelids' of those nodes only contains the top-level base
+ * relation(s), so is not sufficient for said purpose.
+ */
+
+ Bitmapset *partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0b518ce6b2..bd87c35d6c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -85,6 +85,11 @@ typedef struct PlannedStmt
Node *utilityStmt; /* non-null if this is utility stmt */
+ Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
+ * single-subplan [Merge]Append nodes
+ * that have been removed from the
+ * various plan trees. */
+
/* statement location in source string (copied from Query) */
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
@@ -261,6 +266,12 @@ typedef struct Append
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * RT indexes of all partitioned parents whose partitions' plans are
+ * present in appendplans.
+ */
+ Bitmapset *partitioned_rels;
} Append;
/* ----------------
@@ -281,6 +292,12 @@ typedef struct MergeAppend
bool *nullsFirst; /* NULLS FIRST/LAST directions */
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * RT indexes of all partitioned parents whose partitions' plans are
+ * present in appendplans.
+ */
+ Bitmapset *partitioned_rels;
} MergeAppend;
/* ----------------
--
2.24.1
[application/octet-stream] v7-0003-Add-a-plan_tree_walker.patch (3.9K, 3-v7-0003-Add-a-plan_tree_walker.patch)
download | inline diff:
From 761e6c2583b37eb9d45d64de954d65d953277040 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Thu, 3 Mar 2022 16:04:13 +0900
Subject: [PATCH v7 3/4] Add a plan_tree_walker()
Like planstate_tree_walker() but for uninitialized plan trees.
---
src/backend/nodes/nodeFuncs.c | 116 ++++++++++++++++++++++++++++++++++
src/include/nodes/nodeFuncs.h | 3 +
2 files changed, 119 insertions(+)
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 25cf282aab..5e5158ea0e 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -31,6 +31,10 @@ static bool planstate_walk_subplans(List *plans, bool (*walker) (),
void *context);
static bool planstate_walk_members(PlanState **planstates, int nplans,
bool (*walker) (), void *context);
+static bool plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context);
+static bool plan_walk_members(List *plans, bool (*walker) (), void *context);
/*
@@ -4368,3 +4372,115 @@ planstate_walk_members(PlanState **planstates, int nplans,
return false;
}
+
+/*
+ * plan_tree_walker --- walk plantrees
+ *
+ * The walker has already visited the current node, and so we need only
+ * recurse into any sub-nodes it has.
+ */
+bool
+plan_tree_walker(Plan *plan,
+ bool (*walker) (),
+ void *context)
+{
+ /* Guard against stack overflow due to overly complex plan trees */
+ check_stack_depth();
+
+ /* initPlan-s */
+ if (plan_walk_subplans(plan->initPlan, walker, context))
+ return true;
+
+ /* lefttree */
+ if (outerPlan(plan))
+ {
+ if (walker(outerPlan(plan), context))
+ return true;
+ }
+
+ /* righttree */
+ if (innerPlan(plan))
+ {
+ if (walker(innerPlan(plan), context))
+ return true;
+ }
+
+ /* special child plans */
+ switch (nodeTag(plan))
+ {
+ case T_Append:
+ if (plan_walk_members(((Append *) plan)->appendplans,
+ walker, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (plan_walk_members(((MergeAppend *) plan)->mergeplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapAnd:
+ if (plan_walk_members(((BitmapAnd *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapOr:
+ if (plan_walk_members(((BitmapOr *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_CustomScan:
+ if (plan_walk_members(((CustomScan *) plan)->custom_plans,
+ walker, context))
+ return true;
+ break;
+ case T_SubqueryScan:
+ if (walker(((SubqueryScan *) plan)->subplan, context))
+ return true;
+ break;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Walk a list of SubPlans (or initPlans, which also use SubPlan nodes).
+ */
+static bool
+plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+ PlannedStmt *plannedstmt = (PlannedStmt *) context;
+
+ foreach(lc, plans)
+ {
+ SubPlan *sp = lfirst_node(SubPlan, lc);
+ Plan *p = list_nth(plannedstmt->subplans, sp->plan_id - 1);
+
+ if (walker(p, context))
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Walk the constituent plans of a ModifyTable, Append, MergeAppend,
+ * BitmapAnd, or BitmapOr node.
+ */
+static bool
+plan_walk_members(List *plans, bool (*walker) (), void *context)
+{
+ ListCell *lc;
+
+ foreach(lc, plans)
+ {
+ if (walker(lfirst(lc), context))
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 93c60bde66..fca107ad65 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -158,5 +158,8 @@ extern bool raw_expression_tree_walker(Node *node, bool (*walker) (),
struct PlanState;
extern bool planstate_tree_walker(struct PlanState *planstate, bool (*walker) (),
void *context);
+struct Plan;
+extern bool plan_tree_walker(struct Plan *plan, bool (*walker) (),
+ void *context);
#endif /* NODEFUNCS_H */
--
2.24.1
[application/octet-stream] v7-0001-Some-refactoring-of-runtime-pruning-code.patch (26.5K, 4-v7-0001-Some-refactoring-of-runtime-pruning-code.patch)
download | inline diff:
From 60ec0ebb911a2c7c8cc13ea9f96e1fb2038842a0 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 2 Mar 2022 15:17:55 +0900
Subject: [PATCH v7 1/4] Some refactoring of runtime pruning code
This does two things mainly:
* Move the execution pruning initialization steps that are common
between both ExecInitAppend() and ExecInitMergeAppend() into a new
function ExecInitPartitionPruning() defined in execPartition.c.
Thus, ExecCreatePartitionPruneState() and
ExecFindInitialMatchingSubPlans() need not be exported.
* Add an ExprContext field to PartitionPruneContext to remove the
implicit assumption in the runtime pruning code that the ExprContext
to use to compute pruning expressions that need one can always rely
on the PlanState providing it. A future patch will allow runtime
pruning (at least the initial pruning steps) to be performed without
the corresponding PlanState yet having been created, so this will
help.
---
src/backend/executor/execPartition.c | 340 ++++++++++++++++---------
src/backend/executor/nodeAppend.c | 33 +--
src/backend/executor/nodeMergeAppend.c | 32 +--
src/backend/partitioning/partprune.c | 20 +-
src/include/executor/execPartition.h | 9 +-
src/include/partitioning/partprune.h | 2 +
6 files changed, 252 insertions(+), 184 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 90ed1485d1..7ff5a95f05 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,11 +182,18 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
bool *isnull,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
+static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *partitionpruneinfo);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate);
+ PlanState *planstate,
+ ExprContext *econtext);
+static void PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1485,30 +1492,86 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
*
* Functions:
*
- * ExecCreatePartitionPruneState:
+ * ExecInitPartitionPruning:
* Creates the PartitionPruneState required by each of the two pruning
* functions. Details stored include how to map the partition index
- * returned by the partition pruning code into subplan indexes.
- *
- * ExecFindInitialMatchingSubPlans:
- * Returns indexes of matching subplans. Partition pruning is attempted
- * without any evaluation of expressions containing PARAM_EXEC Params.
- * This function must be called during executor startup for the parent
- * plan before the subplans themselves are initialized. Subplans which
- * are found not to match by this function must be removed from the
- * plan's list of subplans during execution, as this function performs a
- * remap of the partition index to subplan index map and the newly
- * created map provides indexes only for subplans which remain after
- * calling this function.
+ * returned by the partition pruning code into subplan indexes. Also
+ * determines the set of initially valid subplans by performing initial
+ * pruning steps, only which need be initialized by the caller such as
+ * ExecInitAppend. Maps in PartitionPruneState are updated to account
+ * for initial pruning having eliminated some of the subplans, if any.
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
- * expressions. This function can only be called during execution and
- * must be called again each time the value of a Param listed in
- * PartitionPruneState's 'execparamids' changes.
+ * expressions, that is, using execution pruning steps. This function can
+ * can only be called during execution and must be called again each time
+ * the value of a Param listed in PartitionPruneState's 'execparamids'
+ * changes.
*-------------------------------------------------------------------------
*/
+/*
+ * ExecInitPartitionPruning
+ * Initialize data structure needed for run-time partition pruning
+ *
+ * Initial pruning can be done immediately, so it is done here if needed and
+ * the set of surviving partition subplans' indexes are added to the output
+ * parameter *initially_valid_subplans.
+ *
+ * If subplans are indeed pruned, subplan_map arrays contained in the returned
+ * PartitionPruneState are re-sequenced to not count those, though only if the
+ * maps will be needed for subsequent execution pruning passes.
+ */
+PartitionPruneState *
+ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans)
+{
+ PartitionPruneState *prunestate;
+ EState *estate = planstate->state;
+
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /*
+ * Create the working data structure for pruning.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+
+ /*
+ * Perform an initial partition prune, if required.
+ */
+ if (prunestate->do_initial_prune)
+ {
+ /* Determine which subplans survive initial pruning */
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ }
+ else
+ {
+ /* We'll need to initialize all subplans */
+ Assert(n_total_subplans > 0);
+ *initially_valid_subplans = bms_add_range(NULL, 0,
+ n_total_subplans - 1);
+ }
+
+ /*
+ * Re-sequence subplan indexes contained in prunestate to account for any
+ * that were removed above due to initial pruning.
+ *
+ * We can safely skip this when !do_exec_prune, even though that leaves
+ * invalid data in prunestate, because that data won't be consulted again
+ * (cf initial Assert in ExecFindMatchingSubPlans).
+ */
+ if (prunestate->do_exec_prune &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ PartitionPruneStateFixSubPlanMap(prunestate,
+ *initially_valid_subplans,
+ n_total_subplans);
+
+ return prunestate;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
@@ -1527,7 +1590,7 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* re-used each time we re-evaluate which partitions match the pruning steps
* provided in each PartitionedRelPruneInfo.
*/
-PartitionPruneState *
+static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
PartitionPruneInfo *partitionpruneinfo)
{
@@ -1536,6 +1599,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
int n_part_hierarchies;
ListCell *lc;
int i;
+ ExprContext *econtext = planstate->ps_ExprContext;
/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1709,7 +1773,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
@@ -1718,7 +1783,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
}
@@ -1746,7 +1812,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate)
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1767,6 +1834,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1795,8 +1863,20 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ /*
+ * When planstate is NULL, pruning_steps is known not to
+ * contain any expressions that depend on the parent plan.
+ * Information of any available EXTERN parameters must be
+ * passed explicitly in that case, which the caller must
+ * have made available via econtext.
+ */
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
@@ -1809,18 +1889,11 @@ ExecInitPruningContext(PartitionPruneContext *context,
* pruning, disregarding any pruning constraints involving PARAM_EXEC
* Params.
*
- * If additional pruning passes will be required (because of PARAM_EXEC
- * Params), we must also update the translation data that allows conversion
- * of partition indexes into subplan indexes to account for the unneeded
- * subplans having been removed.
- *
* Must only be called once per 'prunestate', and only if initial pruning
* is required.
- *
- * 'nsubplans' must be passed as the total number of unpruned subplans.
*/
-Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+static Bitmapset *
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1845,14 +1918,20 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
PartitionedRelPruningData *pprune;
prunedata = prunestate->partprunedata[i];
+
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
pprune = &prunedata->partrelprunedata[0];
/* Perform pruning without using PARAM_EXEC Params */
find_matching_subplans_recurse(prunedata, pprune, true, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->initial_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
@@ -1865,118 +1944,120 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
MemoryContextReset(prunestate->prune_context);
+ return result;
+}
+
+/*
+ * PartitionPruneStateFixSubPlanMap
+ * Fix mapping of partition indexes to subplan indexes contained in
+ * prunestate by considering the new list of subplans that survived
+ * initial pruning
+ *
+ * Subplans would previously be indexed 0..(n_total_subplans - 1) should be
+ * changed to index range 0..num(initially_valid_subplans).
+ */
+static void
+PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans)
+{
+ int *new_subplan_indexes;
+ Bitmapset *new_other_subplans;
+ int i;
+ int newidx;
+
/*
- * If exec-time pruning is required and we pruned subplans above, then we
- * must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
- * properly returns the indexes from the subplans which will remain after
- * execution of this function.
- *
- * We can safely skip this when !do_exec_prune, even though that leaves
- * invalid data in prunestate, because that data won't be consulted again
- * (cf initial Assert in ExecFindMatchingSubPlans).
+ * First we must build a temporary array which maps old subplan
+ * indexes to new ones. For convenience of initialization, we use
+ * 1-based indexes in this array and leave pruned items as 0.
*/
- if (prunestate->do_exec_prune && bms_num_members(result) < nsubplans)
+ new_subplan_indexes = (int *) palloc0(sizeof(int) * n_total_subplans);
+ newidx = 1;
+ i = -1;
+ while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
{
- int *new_subplan_indexes;
- Bitmapset *new_other_subplans;
- int i;
- int newidx;
+ Assert(i < n_total_subplans);
+ new_subplan_indexes[i] = newidx++;
+ }
- /*
- * First we must build a temporary array which maps old subplan
- * indexes to new ones. For convenience of initialization, we use
- * 1-based indexes in this array and leave pruned items as 0.
- */
- new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
- newidx = 1;
- i = -1;
- while ((i = bms_next_member(result, i)) >= 0)
- {
- Assert(i < nsubplans);
- new_subplan_indexes[i] = newidx++;
- }
+ /*
+ * Now we can update each PartitionedRelPruneInfo's subplan_map with
+ * new subplan indexes. We must also recompute its present_parts
+ * bitmap.
+ */
+ for (i = 0; i < prunestate->num_partprunedata; i++)
+ {
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
/*
- * Now we can update each PartitionedRelPruneInfo's subplan_map with
- * new subplan indexes. We must also recompute its present_parts
- * bitmap.
+ * Within each hierarchy, we perform this loop in back-to-front
+ * order so that we determine present_parts for the lowest-level
+ * partitioned tables first. This way we can tell whether a
+ * sub-partitioned table's partitions were entirely pruned so we
+ * can exclude it from the current level's present_parts.
*/
- for (i = 0; i < prunestate->num_partprunedata; i++)
+ for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
{
- PartitionPruningData *prunedata = prunestate->partprunedata[i];
- int j;
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ int nparts = pprune->nparts;
+ int k;
- /*
- * Within each hierarchy, we perform this loop in back-to-front
- * order so that we determine present_parts for the lowest-level
- * partitioned tables first. This way we can tell whether a
- * sub-partitioned table's partitions were entirely pruned so we
- * can exclude it from the current level's present_parts.
- */
- for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
- {
- PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
- int nparts = pprune->nparts;
- int k;
+ /* We just rebuild present_parts from scratch */
+ bms_free(pprune->present_parts);
+ pprune->present_parts = NULL;
- /* We just rebuild present_parts from scratch */
- bms_free(pprune->present_parts);
- pprune->present_parts = NULL;
+ for (k = 0; k < nparts; k++)
+ {
+ int oldidx = pprune->subplan_map[k];
+ int subidx;
- for (k = 0; k < nparts; k++)
+ /*
+ * If this partition existed as a subplan then change the
+ * old subplan index to the new subplan index. The new
+ * index may become -1 if the partition was pruned above,
+ * or it may just come earlier in the subplan list due to
+ * some subplans being removed earlier in the list. If
+ * it's a subpartition, add it to present_parts unless
+ * it's entirely pruned.
+ */
+ if (oldidx >= 0)
{
- int oldidx = pprune->subplan_map[k];
- int subidx;
-
- /*
- * If this partition existed as a subplan then change the
- * old subplan index to the new subplan index. The new
- * index may become -1 if the partition was pruned above,
- * or it may just come earlier in the subplan list due to
- * some subplans being removed earlier in the list. If
- * it's a subpartition, add it to present_parts unless
- * it's entirely pruned.
- */
- if (oldidx >= 0)
- {
- Assert(oldidx < nsubplans);
- pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
+ Assert(oldidx < n_total_subplans);
+ pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
- if (new_subplan_indexes[oldidx] > 0)
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
- else if ((subidx = pprune->subpart_map[k]) >= 0)
- {
- PartitionedRelPruningData *subprune;
+ if (new_subplan_indexes[oldidx] > 0)
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ else if ((subidx = pprune->subpart_map[k]) >= 0)
+ {
+ PartitionedRelPruningData *subprune;
- subprune = &prunedata->partrelprunedata[subidx];
+ subprune = &prunedata->partrelprunedata[subidx];
- if (!bms_is_empty(subprune->present_parts))
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
+ if (!bms_is_empty(subprune->present_parts))
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
}
}
}
+ }
- /*
- * We must also recompute the other_subplans set, since indexes in it
- * may change.
- */
- new_other_subplans = NULL;
- i = -1;
- while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
- new_other_subplans = bms_add_member(new_other_subplans,
- new_subplan_indexes[i] - 1);
-
- bms_free(prunestate->other_subplans);
- prunestate->other_subplans = new_other_subplans;
+ /*
+ * We must also recompute the other_subplans set, since indexes in it
+ * may change.
+ */
+ new_other_subplans = NULL;
+ i = -1;
+ while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
+ new_other_subplans = bms_add_member(new_other_subplans,
+ new_subplan_indexes[i] - 1);
- pfree(new_subplan_indexes);
- }
+ bms_free(prunestate->other_subplans);
+ prunestate->other_subplans = new_other_subplans;
- return result;
+ pfree(new_subplan_indexes);
}
/*
@@ -2018,11 +2099,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
find_matching_subplans_recurse(prunedata, pprune, false, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
- ResetExprContext(pprune->exec_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->exec_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 7937f1c88f..5b6d3eb23b 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -138,30 +138,17 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &appendstate->ps);
-
- /* Create the working data structure for pruning. */
- prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&appendstate->ps,
+ list_length(node->appendplans),
+ node->part_prune_info,
+ &validsubplans);
appendstate->as_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->appendplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 418f89dea8..9a9f29e845 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -86,29 +86,17 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &mergestate->ps);
-
- prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&mergestate->ps,
+ list_length(node->mergeplans),
+ node->part_prune_info,
+ &validsubplans);
mergestate->ms_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->mergeplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 1bc00826c1..7080cb25d9 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -798,6 +798,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +809,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3655,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3678,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..fd5735a946 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -119,10 +119,9 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
EState *estate);
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
-extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
+extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
-extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index ee11b6feae..90684efa25 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,7 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use when evaluating pruning expressions
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +57,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
--
2.24.1
[application/octet-stream] v7-0004-Optimize-AcquireExecutorLocks-to-skip-pruned-part.patch (94.2K, 5-v7-0004-Optimize-AcquireExecutorLocks-to-skip-pruned-part.patch)
download | inline diff:
From 14d951ca644860eec6d72ac03e3a95b12373938b Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v7 4/4] Optimize AcquireExecutorLocks() to skip pruned
partitions
Instead of locking all relations listed in the range table in the
cases where the PlannedStmt indicates that some nodes in the plan
tree can do partition pruning without depending on execution having
started (so called "initial" pruning), AcquireExecutorLocks() now
calls the new executor function ExecutorGetLockRels() which returns
a set of relations (their RT indexes) to be locked not including
those scanned by the subplans that pruned.
The result of pruning done this way must be remembered and reused
during actual execution of the plan, which is done by creating a
PlanInitPruningOutput nodes for for each plan node that undergoes
pruning and a set of those for the whole plan tree are added to
ExecLockRelsInfo which also stores the bitmapset of RT indexes of
relations that are actually locked by AcquireExecutorLocks().
ExecLockRelsInfos are passed down the executor alongside the
PlannedStmts. This arrangement ensures that the executor doesn't
accidentally try to process a plan tree subnodes that has been
deemed pruned by AcquireExecutorLocks().
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 13 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 17 +-
src/backend/executor/README | 24 +++
src/backend/executor/execMain.c | 202 ++++++++++++++++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 224 ++++++++++++++++++----
src/backend/executor/execUtils.c | 8 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 52 ++++-
src/backend/executor/nodeMergeAppend.c | 52 ++++-
src/backend/executor/nodeModifyTable.c | 25 +++
src/backend/executor/spi.c | 14 +-
src/backend/nodes/copyfuncs.c | 49 ++++-
src/backend/nodes/outfuncs.c | 39 ++++
src/backend/nodes/readfuncs.c | 37 ++++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 6 +
src/backend/partitioning/partprune.c | 37 +++-
src/backend/tcop/postgres.c | 15 +-
src/backend/tcop/pquery.c | 21 ++-
src/backend/utils/cache/plancache.c | 252 ++++++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execPartition.h | 2 +
src/include/executor/execdesc.h | 2 +
src/include/executor/executor.h | 2 +
src/include/executor/nodeAppend.h | 1 +
src/include/executor/nodeMergeAppend.h | 1 +
src/include/executor/nodeModifyTable.h | 1 +
src/include/nodes/execnodes.h | 96 ++++++++++
src/include/nodes/nodes.h | 5 +
src/include/nodes/pathnodes.h | 4 +
src/include/nodes/plannodes.h | 15 ++
src/include/tcop/tcopprot.h | 2 +-
src/include/utils/plancache.h | 6 +
src/include/utils/portal.h | 5 +
41 files changed, 1174 insertions(+), 104 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 55c38b04c4..d403eb2309 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -542,7 +542,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9f632285b6..1f1a44b9bb 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, execlockrelsinfo, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1013790dbb..008b8ce0e9 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -741,8 +741,10 @@ execute_sql_string(const char *sql)
RawStmt *parsetree = lfirst_node(RawStmt, lc1);
MemoryContext per_parsetree_context,
oldcontext;
- List *stmt_list;
- ListCell *lc2;
+ List *stmt_list,
+ *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
/*
* We do the work for each parsetree in a short-lived context, to
@@ -762,11 +764,13 @@ execute_sql_string(const char *sql)
NULL,
0,
NULL);
- stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL);
+ stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL,
+ &execlockrelsinfo_list);
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
CommandCounterIncrement();
@@ -777,6 +781,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ execlockrelsinfo,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 05e7b60059..4ef44aaf23 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -416,7 +416,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 9902c5c566..85e73ddded 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL), /* no ExecLockRelsInfo to pass */
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..bbbf8bbcbd 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *plan_execlockrelsinfo_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -195,6 +196,7 @@ ExecuteQuery(ParseState *pstate,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* DO NOT add any logic that could possibly throw an error between
@@ -204,7 +206,7 @@ ExecuteQuery(ParseState *pstate,
NULL,
query_string,
entry->plansource->commandTag,
- plan_list,
+ plan_list, plan_execlockrelsinfo_list,
cplan);
/*
@@ -576,7 +578,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *plan_execlockrelsinfo_list;
+ ListCell *p,
+ *pe;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -632,15 +636,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pe, plan_execlockrelsinfo_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, pe);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, execlockrelsinfo, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index bf5e70860d..9720d0ac2c 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,27 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+proceeds by locking all the relations that will be scanned by that plan. If
+the generic plan has nodes that contain so-called initial pruning steps (a
+subset of execution pruning steps that do not depend on full-fledged execution
+having started), they are performed at this point to figure out the minimal
+set of child subplans that satisfy those pruning instructions and the result
+of performing that pruning is saved in a data structure that gets passed to
+the executor alongside the plan tree. Relations scanned by only those
+surviving subplans are then locked while those scanned by the pruned subplans
+are not, even though the pruned subplans themselves are not removed from the
+plan tree. So, it is imperative that the executor and any third party code
+invoked by it that gets passed the plan tree look at the initial pruning result
+made available via the aforementioned data structure to determine whether or
+not a particular subplan is valid. (The data structure basically consists of
+an array of PlanInitPruningOutput nodes containing one element for each node
+of the plan tree indexable using plan_node_id of the individual plan nodes,
+where each node contains a bitmapset of indexes of unpruned child subplans of
+a given node.)
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -247,6 +268,9 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorGetLockRels ] --- an optional step to walk over the plan tree
+ to produce an ExecLockRelsInfo to be passed to CreateQueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 473d2e00a2..1ddd1dfb83 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,11 +49,15 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/nodeAppend.h"
+#include "executor/nodeMergeAppend.h"
+#include "executor/nodeModifyTable.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
#include "parser/parsetree.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -101,9 +105,205 @@ static char *ExecBuildSlotValueDescription(Oid reloid,
Bitmapset *modifiedCols,
int maxfieldlen);
static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
+static bool ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorGetLockRels
+ *
+ * Figure out the minimal set of relations to lock to be able to safely
+ * execute a given plan
+ *
+ * This ignores the relations scanned by child subplans that are pruned away
+ * after performing initial pruning steps present in the plan using the
+ * provided set of EXTERN parameters.
+ *
+ * Along with the set of RT indexes of relations that must be locked, the
+ * returned struct also contains an array of PlanInitPruningOutput nodes each
+ * of which contains the result of initial pruning for a given plan node, which
+ * is basically a bitmapset of the indexes of surviving child subplans. Each
+ * plan node in the tree that undergoes pruning will have an element in the
+ * array.
+ *
+ * Note that while relations scanned by the subplans that are pruned will not
+ * be locked, the subplans themselves are left as-is in the plan tree, assuming
+ * anything that reads the plan tree during execution knows to ignore them by
+ * looking at the PlanInitPruningOutput's list of valid subplans.
+ *
+ * Partitioned tables mentioned in PartitionedRelPruneInfo nodes that drive
+ * the pruning will be locked before doing the pruning and also added to the
+ * the returned set.
+ */
+ExecLockRelsInfo *
+ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ int numPlanNodes = plannedstmt->numPlanNodes;
+ ExecGetLockRelsContext context;
+ ExecLockRelsInfo *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ context.stmt = plannedstmt;
+ context.params = params;
+
+ /*
+ * Go walk all the plan tree(s) present in the PlannedStmt, filling
+ * context.lockrels with only the relations from plan nodes that
+ * survive initial pruning and also the tables mentioned in
+ * partitioned_rels sets found in the plan.
+ */
+ context.lockrels = NULL;
+ context.initPruningOutputs = NIL;
+ context.ipoIndexes = palloc0(sizeof(int) * numPlanNodes);
+
+ /* All the subplans. */
+ foreach(lc, plannedstmt->subplans)
+ {
+ Plan *subplan = lfirst(lc);
+
+ (void) ExecGetLockRels(subplan, &context);
+ }
+
+ /* And the main tree. */
+ (void) ExecGetLockRels(plannedstmt->planTree, &context);
+
+ /*
+ * Also be sure to lock partitioned relations from any [Merge]Append nodes
+ * that were originally present but were ultimately left out from the plan
+ * due to being deemed no-op nodes.
+ */
+ context.lockrels = bms_add_members(context.lockrels,
+ plannedstmt->elidedAppendPartedRels);
+
+ result = makeNode(ExecLockRelsInfo);
+ result->lockrels = context.lockrels;
+ result->numPlanNodes = numPlanNodes;
+ result->initPruningOutputs = context.initPruningOutputs;
+ result->ipoIndexes = context.ipoIndexes;
+
+ return result;
+}
+
+/* ------------------------------------------------------------------------
+ * ExecGetLockRels
+ * Adds all the relations that will be scanned by 'node' and its child
+ * plans to context->lockrels after taking into the account the effect
+ * of performing initial pruning if any
+ *
+ * context->stmt gives the PlannedStmt being inspected to access the plan's
+ * range table if needed and context->params the set of EXTERN parameters
+ * available to evaluate pruning parameters.
+ *
+ * If initial pruning is done, a PlanInitPruningOutput node containing the
+ * result of pruning will be stored in context->initPruningOutputs that will
+ * be made available to the executor to reuse.
+ * ------------------------------------------------------------------------
+ */
+bool
+ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context)
+{
+ /* Do nothing when we get to the end of a leaf on tree. */
+ if (node == NULL)
+ return true;
+
+ /* Make sure there's enough stack available. */
+ check_stack_depth();
+
+ switch (nodeTag(node))
+ {
+ /* Currently, only these two nodes have prunable child subplans. */
+ case T_Append:
+ if (ExecGetAppendLockRels((Append *) node, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (ExecGetMergeAppendLockRels((MergeAppend *) node,
+ context))
+ return true;
+ break;
+
+ /*
+ * And these manipulate relations that must be added context->lockrels.
+ */
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapIndexScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ case T_ForeignScan:
+ case T_SubqueryScan:
+ case T_CustomScan:
+ if (ExecGetScanLockRels((Scan *) node, context))
+ return true;
+ break;
+ case T_ModifyTable:
+ if (ExecGetModifyTableLockRels((ModifyTable *) node, context))
+ return true;
+ /* plan_tree_walker() will visit the subplan (outerNode) */
+ break;
+
+ default:
+ break;
+ }
+
+ /* Recurse to subnodes. */
+ return plan_tree_walker(node, ExecGetLockRels, (void *) context);
+}
+
+/*
+ * ExecGetScanLockRels
+ * Do ExecGetLockRels()'s work for a leaf Scan node
+ */
+static bool
+ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context)
+{
+ switch (nodeTag(scan))
+ {
+ case T_ForeignScan:
+ {
+ ForeignScan *fscan = (ForeignScan *) scan;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ fscan->fs_relids);
+ }
+ break;
+
+ case T_SubqueryScan:
+ {
+ SubqueryScan *sscan = (SubqueryScan *) scan;
+
+ (void) ExecGetLockRels((Plan *) sscan->subplan, context);
+ }
+ break;
+
+ case T_CustomScan:
+ {
+ CustomScan *cscan = (CustomScan *) scan;
+ ListCell *lc;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ cscan->custom_relids);
+ foreach(lc, cscan->custom_plans)
+ {
+ (void) ExecGetLockRels((Plan *) lfirst(lc), context);
+ }
+ }
+ break;
+
+ default:
+ context->lockrels = bms_add_member(context->lockrels,
+ scan->scanrelid);
+ break;
+ }
+
+ return true;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -805,6 +1005,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ ExecLockRelsInfo *execlockrelsinfo = queryDesc->execlockrelsinfo;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -824,6 +1025,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_execlockrelsinfo = execlockrelsinfo;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 9a0d5d59ef..fb6dbd298a 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_EXECLOCKRELSINFO UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
@@ -596,12 +598,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *execlockrelsinfo_data;
+ char *execlockrelsinfo_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int execlockrelsinfo_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -630,6 +635,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ execlockrelsinfo_data = nodeToString(estate->es_execlockrelsinfo);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -656,6 +662,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized ExecLockRelsInfo. */
+ execlockrelsinfo_len = strlen(execlockrelsinfo_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, execlockrelsinfo_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -750,6 +761,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized ExecLockRelsInfo */
+ execlockrelsinfo_space = shm_toc_allocate(pcxt->toc, execlockrelsinfo_len);
+ memcpy(execlockrelsinfo_space, execlockrelsinfo_data, execlockrelsinfo_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ execlockrelsinfo_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1231,8 +1248,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *execlockrelsinfospace;
char *paramspace;
PlannedStmt *pstmt;
+ ExecLockRelsInfo *execlockrelsinfo;
ParamListInfo paramLI;
char *queryString;
@@ -1243,12 +1262,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied ExecLockRelsInfo. */
+ execlockrelsinfospace = shm_toc_lookup(toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ false);
+ execlockrelsinfo = (ExecLockRelsInfo *) stringToNode(execlockrelsinfospace);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, execlockrelsinfo,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 7ff5a95f05..fddc97280e 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -24,6 +24,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -183,8 +184,13 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
-static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1483,8 +1489,9 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or even before during ExecutorGetLockRels().
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1496,10 +1503,17 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* Creates the PartitionPruneState required by each of the two pruning
* functions. Details stored include how to map the partition index
* returned by the partition pruning code into subplan indexes. Also
- * determines the set of initially valid subplans by performing initial
- * pruning steps, only which need be initialized by the caller such as
- * ExecInitAppend. Maps in PartitionPruneState are updated to account
- * for initial pruning having eliminated some of the subplans, if any.
+ * determines the set of initially valid subplans by either looking that
+ * up in the plan node's PlanInitPruningOutput if one found in
+ * EState.es_execlockrelinfo or by performing initial pruning steps.
+ * Only the subplans included in that need be initialized by the caller
+ * such as ExecInitAppend. Maps in PartitionPruneState are updated to
+ * account for initial pruning having eliminated some of the subplans,
+ * if any.
+ *
+ * ExecGetLockRelsDoInitialPruning:
+ * Do initial pruning as part of ExecGetLockRels() on the parent plan
+ * node
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
@@ -1514,9 +1528,10 @@ adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri)
* ExecInitPartitionPruning
* Initialize data structure needed for run-time partition pruning
*
- * Initial pruning can be done immediately, so it is done here if needed and
- * the set of surviving partition subplans' indexes are added to the output
- * parameter *initially_valid_subplans.
+ * Initial pruning can be done immediately, so it is done here unless it has
+ * already been done by ExecGetLockRelsDoInitialPruning(), and the set of
+ * surviving partition subplans' indexes are added to the output parameter
+ * *initially_valid_subplans.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1530,22 +1545,57 @@ ExecInitPartitionPruning(PlanState *planstate,
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ Plan *plan = planstate->plan;
+ PlanInitPruningOutput *initPruningOutput = NULL;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /* Retrieve the parent plan's PlanInitPruningOutput, if any. */
+ if (estate->es_execlockrelsinfo)
+ {
+ initPruningOutput = (PlanInitPruningOutput *)
+ ExecFetchPlanInitPruningOutput(estate->es_execlockrelsinfo, plan);
- /*
- * Create the working data structure for pruning.
- */
- prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+ Assert(initPruningOutput != NULL &&
+ IsA(initPruningOutput, PlanInitPruningOutput));
+ /* No need to do initial pruning again, only exec pruning. */
+ do_pruning = pruneinfo->needs_exec_pruning;
+ }
+
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PlanInitPruningOutput.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo,
+ initPruningOutput == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune, if required.
*/
- if (prunestate->do_initial_prune)
+ if (initPruningOutput)
+ {
+ /* ExecGetLockRelsDoInitialPruning() already did it for us! */
+ *initially_valid_subplans = initPruningOutput->initially_valid_subplans;
+ }
+ else if (prunestate && prunestate->do_initial_prune)
{
/* Determine which subplans survive initial pruning */
- *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate,
+ pruneinfo);
}
else
{
@@ -1563,7 +1613,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* invalid data in prunestate, because that data won't be consulted again
* (cf initial Assert in ExecFindMatchingSubPlans).
*/
- if (prunestate->do_exec_prune &&
+ if (prunestate && prunestate->do_exec_prune &&
bms_num_members(*initially_valid_subplans) < n_total_subplans)
PartitionPruneStateFixSubPlanMap(prunestate,
*initially_valid_subplans,
@@ -1572,12 +1622,75 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecGetLockRelsDoInitialPruning
+ * Perform initial pruning as part of doing ExecGetLockRels() on the parent
+ * plan node
+ */
+Bitmapset *
+ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo)
+{
+ List *rtable = context->stmt->rtable;
+ ParamListInfo params = context->params;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ PlanInitPruningOutput *initPruningOutput;
+
+ /*
+ * A temporary context to allocate stuff needded to run the pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so must create
+ * a standalone ExprContext to evaluate pruning expressions, equipped with
+ * the information about the EXTERN parameters that the caller passed us.
+ * Note that that's okay because the initial pruning steps do not contain
+ * anything that requires the execution to have started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = ExecCreatePartitionPruneState(NULL, pruneinfo,
+ true, false,
+ rtable, econtext,
+ pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the pruning and populate a PlanInitPruningOutput for this node. */
+ initPruningOutput = makeNode(PlanInitPruningOutput);
+ initPruningOutput->initially_valid_subplans =
+ ExecFindInitialMatchingSubPlans(prunestate, pruneinfo);
+ ExecStorePlanInitPruningOutput(context, initPruningOutput, plan);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return initPruningOutput->initially_valid_subplans;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
* ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'partitionpruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1592,19 +1705,20 @@ ExecInitPartitionPruning(PlanState *planstate,
*/
static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo)
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1655,19 +1769,48 @@ ExecCreatePartitionPruneState(PlanState *planstate,
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
+ bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorGetLockRels() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ close_partrel = true;
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (close_partrel)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1769,7 +1912,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1779,7 +1922,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -1893,7 +2036,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
* is required.
*/
static Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1903,8 +2047,8 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
Assert(prunestate->do_initial_prune);
/*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
+ * Switch to a temp context to avoid leaking memory in the longer-term
+ * memory context.
*/
oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..7246f9175f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_execlockrelsinfo = NULL;
estate->es_junkFilter = NULL;
@@ -785,6 +786,13 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rti > 0 && rti <= estate->es_range_table_size);
+ /*
+ * A cross-check that AcquireExecutorLocks() hasn't missed any relations
+ * it must not have.
+ */
+ Assert(estate->es_execlockrelsinfo == NULL ||
+ bms_is_member(rti, estate->es_execlockrelsinfo->lockrels));
+
rel = estate->es_relations[rti - 1];
if (rel == NULL)
{
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 5b6d3eb23b..9c6f907687 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -94,6 +94,55 @@ static bool ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result);
static void ExecAppendAsyncEventWait(AppendState *node);
static void classify_matching_subplans(AppendState *node);
+/* ----------------------------------------------------------------
+ * ExecGetAppendLockRels
+ * Do ExecGetLockRels()'s work for an Append plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ /*
+ * Must always lock all the partitioned tables whose direct and indirect
+ * partitions will be scanned by this Append.
+ */
+ context->lockrels = bms_add_members(context->lockrels,
+ node->partitioned_rels);
+
+ /*
+ * Now recurse to subplans to add relations scanned therein.
+ *
+ * If initial pruning can be done, do that now and only recurse to the
+ * surviving subplans.
+ */
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->appendplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Recurse to surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /* Tell the caller to recurse to *all* the subplans. */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitAppend
*
@@ -155,7 +204,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 9a9f29e845..4b04fcdbc2 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -54,6 +54,55 @@ typedef int32 SlotNumber;
static TupleTableSlot *ExecMergeAppend(PlanState *pstate);
static int heap_compare_slots(Datum a, Datum b, void *arg);
+/* ----------------------------------------------------------------
+ * ExecGetMergeAppendLockRels
+ * Do ExecGetLockRels()'s work for a MergeAppend plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ /*
+ * Must always lock all the partitioned tables whose direct and indirect
+ * partitions will be scanned by this Append.
+ */
+ context->lockrels = bms_add_members(context->lockrels,
+ node->partitioned_rels);
+
+ /*
+ * Now recurse to subplans to add relations scanned therein.
+ *
+ * If initial pruning can be done, do that now and only recurse to the
+ * surviving subplans.
+ */
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->mergeplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Recurse to surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /* Tell the caller to recurse to *all* the subplans. */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitMergeAppend
@@ -103,7 +152,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 701fe05296..23df3efef0 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3008,6 +3008,31 @@ ExecLookupResultRelByOid(ModifyTableState *node, Oid resultoid,
return NULL;
}
+/*
+ * ExecGetModifyTableLockRels
+ * Do ExecGetLockRels()'s work for a ModifyTable plan
+ */
+bool
+ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context)
+{
+ ListCell *lc;
+
+ /* First add the result relation RTIs mentioned in the node. */
+ if (plan->rootRelation > 0)
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->rootRelation);
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->nominalRelation);
+ foreach(lc, plan->resultRelations)
+ {
+ context->lockrels = bms_add_member(context->lockrels,
+ lfirst_int(lc));
+ }
+
+ /* Tell the caller to recurse to the subplan (outerPlan(plan)). */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitModifyTable
* ----------------------------------------------------------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index a82e986667..2107009591 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *execlockrelsinfo_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1659,6 +1660,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
if (!plan->saved)
{
@@ -1670,6 +1672,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
+ execlockrelsinfo_list = copyObject(execlockrelsinfo_list);
MemoryContextSwitchTo(oldcontext);
ReleaseCachedPlan(cplan, NULL);
cplan = NULL; /* portal shouldn't depend on cplan */
@@ -1683,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ execlockrelsinfo_list,
cplan);
/*
@@ -2473,7 +2477,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2552,6 +2558,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2589,9 +2596,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2671,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, execlockrelsinfo,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index d4b5cc7e59..631727d310 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -68,6 +68,13 @@
} \
} while (0)
+/* Copy a field that is an array with numElem ints */
+#define COPY_INT_ARRAY(fldname, numElem) \
+ do { \
+ newnode->fldname = (numElem) > 0 ? palloc((numElem) * sizeof(int)) : NULL; \
+ memcpy(newnode->fldname, from->fldname, sizeof(int) * (numElem)); \
+ } while (0)
+
/* Copy a parse location field (for Copy, this is same as scalar case) */
#define COPY_LOCATION_FIELD(fldname) \
(newnode->fldname = from->fldname)
@@ -94,8 +101,10 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(transientPlan);
COPY_SCALAR_FIELD(dependsOnRole);
COPY_SCALAR_FIELD(parallelModeNeeded);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_SCALAR_FIELD(numPlanNodes);
COPY_NODE_FIELD(rtable);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
@@ -1281,6 +1290,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -5137,6 +5148,33 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static ExecLockRelsInfo *
+_copyExecLockRelsInfo(const ExecLockRelsInfo *from)
+{
+ ExecLockRelsInfo *newnode = makeNode(ExecLockRelsInfo);
+
+ COPY_BITMAPSET_FIELD(lockrels);
+ COPY_SCALAR_FIELD(numPlanNodes);
+ COPY_NODE_FIELD(initPruningOutputs);
+ COPY_INT_ARRAY(ipoIndexes, from->numPlanNodes);
+
+ return newnode;
+}
+
+static PlanInitPruningOutput *
+_copyPlanInitPruningOutput(const PlanInitPruningOutput *from)
+{
+ PlanInitPruningOutput *newnode = makeNode(PlanInitPruningOutput);
+
+ COPY_BITMAPSET_FIELD(initially_valid_subplans);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -5191,7 +5229,6 @@ _copyBitString(const BitString *from)
return newnode;
}
-
static ForeignKeyCacheInfo *
_copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
{
@@ -6176,6 +6213,16 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ retval = _copyExecLockRelsInfo(from);
+ break;
+ case T_PlanInitPruningOutput:
+ retval = _copyPlanInitPruningOutput(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 99056272f3..f361d2e2bc 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -312,8 +312,10 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(transientPlan);
WRITE_BOOL_FIELD(dependsOnRole);
WRITE_BOOL_FIELD(parallelModeNeeded);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_INT_FIELD(numPlanNodes);
WRITE_NODE_FIELD(rtable);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
@@ -1007,6 +1009,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -2747,6 +2751,31 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outExecLockRelsInfo(StringInfo str, const ExecLockRelsInfo *node)
+{
+ WRITE_NODE_TYPE("EXECLOCKRELSINFO");
+
+ WRITE_BITMAPSET_FIELD(lockrels);
+ WRITE_INT_FIELD(numPlanNodes);
+ WRITE_NODE_FIELD(initPruningOutputs);
+ WRITE_INT_ARRAY(ipoIndexes, node->numPlanNodes);
+}
+
+static void
+_outPlanInitPruningOutput(StringInfo str, const PlanInitPruningOutput *node)
+{
+ WRITE_NODE_TYPE("PLANINITPRUNINGOUTPUT");
+
+ WRITE_BITMAPSET_FIELD(initially_valid_subplans);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4600,6 +4629,16 @@ outNode(StringInfo str, const void *obj)
_outJsonConstructorExpr(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ _outExecLockRelsInfo(str, obj);
+ break;
+ case T_PlanInitPruningOutput:
+ _outPlanInitPruningOutput(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 7536f216bd..41fc710999 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1650,8 +1650,10 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(transientPlan);
READ_BOOL_FIELD(dependsOnRole);
READ_BOOL_FIELD(parallelModeNeeded);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_INT_FIELD(numPlanNodes);
READ_NODE_FIELD(rtable);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
@@ -2602,6 +2604,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2771,6 +2775,35 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+/*
+ * _readExecLockRelsInfo
+ */
+static ExecLockRelsInfo *
+_readExecLockRelsInfo(void)
+{
+ READ_LOCALS(ExecLockRelsInfo);
+
+ READ_BITMAPSET_FIELD(lockrels);
+ READ_INT_FIELD(numPlanNodes);
+ READ_NODE_FIELD(initPruningOutputs);
+ READ_INT_ARRAY(ipoIndexes, local_node->numPlanNodes);
+
+ READ_DONE();
+}
+
+/*
+ * _readPlanInitPruningOutput
+ */
+static PlanInitPruningOutput *
+_readPlanInitPruningOutput(void)
+{
+ READ_LOCALS(PlanInitPruningOutput);
+
+ READ_BITMAPSET_FIELD(initially_valid_subplans);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -3050,6 +3083,10 @@ parseNodeString(void)
return_value = _readJsonValueExpr();
else if (MATCH("JSONCTOREXPR", 12))
return_value = _readJsonConstructorExpr();
+ else if (MATCH("EXECLOCKRELSINFO", 16))
+ return_value = _readExecLockRelsInfo();
+ else if (MATCH("PLANINITPRUNINGOUTPUT", 21))
+ return_value = _readPlanInitPruningOutput();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 374a9d9753..329fb9d6e7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -517,7 +517,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->transientPlan = glob->transientPlan;
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->planTree = top_plan;
+ result->numPlanNodes = glob->lastPlanNodeId;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index dbdeb8ec9d..ac795ae9d9 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1561,6 +1561,9 @@ set_append_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (aplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
@@ -1648,6 +1651,9 @@ set_mergeappend_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (mplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7080cb25d9..3322dc79f2 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +232,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +313,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!needs_init_pruning)
+ needs_init_pruning = partrel_needs_init_pruning;
+ if (!needs_exec_pruning)
+ needs_exec_pruning = partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -337,6 +349,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +449,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +471,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +562,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +639,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*needs_init_pruning)
+ *needs_init_pruning = (initial_pruning_steps != NIL);
+ if (!*needs_exec_pruning)
+ *needs_exec_pruning = (exec_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ba2fcfeb4a..085eb3f209 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -945,15 +945,17 @@ pg_plan_query(Query *querytree, const char *query_string, int cursorOptions,
* For normal optimizable statements, invoke the planner. For utility
* statements, just make a wrapper PlannedStmt node.
*
- * The result is a list of PlannedStmt nodes.
+ * The result is a list of PlannedStmt nodes. Also, a NULL is appended to
+ * *execlockrelsinfo_list for each PlannedStmt added to the returned list.
*/
List *
pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
- ParamListInfo boundParams)
+ ParamListInfo boundParams, List **execlockrelsinfo_list)
{
List *stmt_list = NIL;
ListCell *query_list;
+ *execlockrelsinfo_list = NIL;
foreach(query_list, querytrees)
{
Query *query = lfirst_node(Query, query_list);
@@ -977,6 +979,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
}
stmt_list = lappend(stmt_list, stmt);
+ *execlockrelsinfo_list = lappend(*execlockrelsinfo_list, NULL);
}
return stmt_list;
@@ -1080,7 +1083,8 @@ exec_simple_query(const char *query_string)
QueryCompletion qc;
MemoryContext per_parsetree_context = NULL;
List *querytree_list,
- *plantree_list;
+ *plantree_list,
+ *plantree_execlockrelsinfo_list;
Portal portal;
DestReceiver *receiver;
int16 format;
@@ -1167,7 +1171,8 @@ exec_simple_query(const char *query_string)
NULL, 0, NULL);
plantree_list = pg_plan_queries(querytree_list, query_string,
- CURSOR_OPT_PARALLEL_OK, NULL);
+ CURSOR_OPT_PARALLEL_OK, NULL,
+ &plantree_execlockrelsinfo_list);
/*
* Done with the snapshot used for parsing/planning.
@@ -1203,6 +1208,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ plantree_execlockrelsinfo_list,
NULL);
/*
@@ -1991,6 +1997,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ cplan->execlockrelsinfo_list,
cplan);
/* Done with the snapshot used for parameter I/O and parsing/planning */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5f907831a3..972ddc014e 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->execlockrelsinfo = execlockrelsinfo; /* ExecutorGetLockRels() output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +124,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * execlockrelsinfo: ExecutorGetLockRels() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +137,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +149,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, execlockrelsinfo, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -490,6 +494,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ linitial_node(ExecLockRelsInfo, portal->execlockrelsinfos),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1190,7 +1195,8 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *stmtlist_item,
+ *execlockrelsinfolist_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1211,9 +1217,12 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ forboth(stmtlist_item, portal->stmts,
+ execlockrelsinfolist_item, portal->execlockrelsinfos)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo,
+ execlockrelsinfolist_item);
/*
* If we got a cancel signal in prior command, quit
@@ -1271,7 +1280,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1280,7 +1289,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4cf6db504f..9f5a40a0a6 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,16 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
+static void CachedPlanSaveExecLockRelsInfos(CachedPlan *plan, List *execlockrelsinfo_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static List *AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams);
+static void ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,9 +792,21 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * If the CachedPlan is valid, this may in some cases call ExecutorGetLockRels
+ * on each PlannedStmt contained in it to determine the set of relations to be
+ * locked by AcquireExecutorLocks(), instead of just scanning its range table,
+ * which is done to prune away any nodes in the tree that need not be executed
+ * based on the result of initial partition pruning. Resulting
+ * ExecLockRelsInfo nodes containing the result of such pruning, allocated in
+ * a child context of the context containing the plan itself, are added into
+ * plan->execlockrelsinfo_list. The previous contents of the list from the
+ * last invocation on the same CachedPlan are deleted, because they would no
+ * longer be valid given the fresh set of parameter values which may be used
+ * as pruning parameters.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -820,13 +834,25 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *execlockrelsinfo_list;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. If ExecutorGetLockRels() asked
+ * to omit some relations because the plan nodes that scan them were
+ * found to be pruned, the executor will be informed of the omission of
+ * the plan nodes themselves, so that it doesn't accidentally try to
+ * execute those nodes, via the ExecLockRelsInfo nodes collected in the
+ * returned list that is also passed to it along with the list of
+ * PlannedStmts.
+ */
+ execlockrelsinfo_list = AcquireExecutorLocks(plan->stmt_list,
+ boundParams);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -844,11 +870,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (plan->is_valid)
{
/* Successfully revalidated and locked the query. */
+
+ /* Remember ExecLockRelsInfos in the CachedPlan. */
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
return true;
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, execlockrelsinfo_list);
}
/*
@@ -880,7 +909,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv)
{
CachedPlan *plan;
- List *plist;
+ List *plist,
+ *execlockrelsinfo_list;
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
@@ -933,7 +963,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* Generate the plan.
*/
plist = pg_plan_queries(qlist, plansource->query_string,
- plansource->cursor_options, boundParams);
+ plansource->cursor_options, boundParams,
+ &execlockrelsinfo_list);
/* Release snapshot if we got one */
if (snapshot_set)
@@ -1002,6 +1033,16 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_saved = false;
plan->is_valid = true;
+ /*
+ * Save the dummy ExecLockRelsInfo list, that is a list containing NULLs
+ * as elements. We must do this, becasue users of the CachedPlan expect
+ * one to go with the list of PlannedStmts.
+ * XXX maybe get rid of that contract.
+ */
+ plan->execlockrelsinfo_context = NULL;
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
+ Assert(MemoryContextIsValid(plan->execlockrelsinfo_context));
+
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1160,7 +1201,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1586,6 +1627,49 @@ CopyCachedPlan(CachedPlanSource *plansource)
return newsource;
}
+/*
+ * CachedPlanSaveExecLockRelsInfos
+ * Save the list containing ExecLockRelsInfo nodes into the given
+ * CachedPlan
+ *
+ * The provided list is copied into a dedicated context that is a child of
+ * plan->context. If the child context already exists, it is emptied, because
+ * any ExecLockRelsInfo contained therein would no longer be useful.
+ */
+static void
+CachedPlanSaveExecLockRelsInfos(CachedPlan *plan, List *execlockrelsinfo_list)
+{
+ MemoryContext execlockrelsinfo_context = plan->execlockrelsinfo_context,
+ oldcontext = CurrentMemoryContext;
+ List *execlockrelsinfo_list_copy;
+
+ /*
+ * Set up the dedicated context if not already done, saving it as a child
+ * of the CachedPlan's context.
+ */
+ if (execlockrelsinfo_context == NULL)
+ {
+ execlockrelsinfo_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan execlockrelsinfo list",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextSetParent(execlockrelsinfo_context, plan->context);
+ MemoryContextSetIdentifier(execlockrelsinfo_context, plan->context->ident);
+ plan->execlockrelsinfo_context = execlockrelsinfo_context;
+ }
+ else
+ {
+ /* Just clear existing contents by resetting the context. */
+ Assert(MemoryContextIsValid(execlockrelsinfo_context));
+ MemoryContextReset(execlockrelsinfo_context);
+ }
+
+ MemoryContextSwitchTo(execlockrelsinfo_context);
+ execlockrelsinfo_list_copy = copyObject(execlockrelsinfo_list);
+ MemoryContextSwitchTo(oldcontext);
+
+ plan->execlockrelsinfo_list = execlockrelsinfo_list_copy;
+}
+
/*
* CachedPlanIsValid: test whether the rewritten querytree within a
* CachedPlanSource is currently valid (that is, not marked as being in need
@@ -1737,17 +1821,21 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * Returns a list of ExecLockRelsInfo nodes containing one element for each
+ * PlannedStmt in stmt_list or NULL if the latter is utility statement or its
+ * containsInitialPruning is false.
*/
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+static List *
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams)
{
ListCell *lc1;
+ List *execlockrelsinfo_list = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ ExecLockRelsInfo *execlockrelsinfo = NULL;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,27 +1849,139 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
- continue;
+ ScanQueryForLocks(query, true);
}
-
- foreach(lc2, plannedstmt->rtable)
+ else
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (!plannedstmt->containsInitialPruning)
+ {
+ /*
+ * If the plan contains no initial pruning steps, just lock
+ * all the relations found in the range table.
+ */
+ ListCell *lc;
- if (rte->rtekind != RTE_RELATION)
- continue;
+ foreach(lc, plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst(lc);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ /*
+ * Acquire the appropriate type of lock on each relation
+ * OID. Note that we don't actually try to open the rel,
+ * and hence will not fail if it's been dropped entirely
+ * --- we'll just transiently acquire a non-conflicting
+ * lock.
+ */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ else
+ {
+ int rti;
+ Bitmapset *lockrels;
+
+ /*
+ * Walk the plan tree to find only the minimal set of
+ * relations to be locked, considering the effect of performing
+ * initial partition pruning.
+ */
+ execlockrelsinfo = ExecutorGetLockRels(plannedstmt, boundParams);
+ lockrels = execlockrelsinfo->lockrels;
+
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment above. */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ }
+
+ /*
+ * Remember ExecLockRelsInfo for later adding to the QueryDesc that
+ * will be passed to the executor when executing this plan. May be
+ * NULL, but must keep the list the same length as stmt_list.
+ */
+ execlockrelsinfo_list = lappend(execlockrelsinfo_list,
+ execlockrelsinfo);
+ }
+
+ return execlockrelsinfo_list;
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, execlockrelsinfo_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc2);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
/*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ }
+ else
+ {
+ if (execlockrelsinfo == NULL)
+ {
+ ListCell *lc;
+
+ foreach(lc, plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst(lc);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ {
+ int rti;
+ Bitmapset *lockrels;
+
+ lockrels = execlockrelsinfo->lockrels;
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..896f51be08 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -285,6 +285,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan)
{
AssertArg(PortalIsValid(portal));
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->execlockrelsinfos = execlockrelsinfos;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..fef75ba147 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index fd5735a946..ded19b8cbb 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,4 +124,6 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
+extern Bitmapset *ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..4338463479 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecLockRelsInfo *execlockrelsinfo; /* ExecutorGetLockRels()'s output given plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 82925b4b63..5cf414cc11 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern ExecLockRelsInfo *ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params);
+extern bool ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4cb78ee5b6..b53535c2a4 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern bool ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context);
extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
extern void ExecEndAppend(AppendState *node);
extern void ExecReScanAppend(AppendState *node);
diff --git a/src/include/executor/nodeMergeAppend.h b/src/include/executor/nodeMergeAppend.h
index 97fe3b0665..8eb4e9df93 100644
--- a/src/include/executor/nodeMergeAppend.h
+++ b/src/include/executor/nodeMergeAppend.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern bool ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context);
extern MergeAppendState *ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags);
extern void ExecEndMergeAppend(MergeAppendState *node);
extern void ExecReScanMergeAppend(MergeAppendState *node);
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index 1d225bc88d..5006499088 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -19,6 +19,7 @@ extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
EState *estate, TupleTableSlot *slot,
CmdType cmdtype);
+extern bool ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context);
extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
extern void ExecEndModifyTable(ModifyTableState *node);
extern void ExecReScanModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 44dd73fc80..1253fdb0ed 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -576,6 +576,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct ExecLockRelsInfo *es_execlockrelsinfo; /* QueryDesc.execlockrelsinfo */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -964,6 +965,101 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * ExecLockRelsInfo
+ *
+ * Result of performing ExecutorGetLockRels() for a given PlannedStmt
+ */
+typedef struct ExecLockRelsInfo
+{
+ NodeTag type;
+
+ /*
+ * Relations that must be locked to execute the plan tree contained in
+ * the PlannedStmt.
+ */
+ Bitmapset *lockrels;
+
+ /* PlannedStmt.numPlanNodes */
+ int numPlanNodes;
+
+ /*
+ * List of PlanInitPruningOutput, each representing the output of
+ * performing initial pruning on a given plan node, for all nodes in the
+ * plan tree that have been marked as needing initial pruning.
+ *
+ * 'ipoIndexes' is an array of 'numPlanNodes' elements, indexed with
+ * plan_node_id of the individual nodes in the plan tree, each a 1-based
+ * index into 'initPruningOutputs' list for a given plan node. 0 means
+ * that a given plan node has no entry in the list because of not needing
+ * any initial pruning done on it.
+ */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecLockRelsInfo;
+
+/*----------------
+ * ExecGetLockRelsContext
+ *
+ * Information pertaining to ExecutorGetLockRels() invocation for a given
+ * plan.
+ */
+typedef struct ExecGetLockRelsContext
+{
+ NodeTag type;
+
+ PlannedStmt *stmt; /* target plan */
+ ParamListInfo params; /* EXTERN parameters available for pruning */
+
+ /* Output parameters for ExecGetLockRels and its subroutines. */
+ Bitmapset *lockrels;
+
+ /* See the omment in the definition of ExecLockRelsInfo struct. */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecGetLockRelsContext;
+
+/*
+ * Appends the provided PlanInitPruningOutput to
+ * ExecGetLockRelsContext.initPruningOutput
+ */
+#define ExecStorePlanInitPruningOutput(cxt, initPruningOutput, plannode) \
+ do { \
+ (cxt)->initPruningOutputs = lappend((cxt)->initPruningOutputs, initPruningOutput); \
+ (cxt)->ipoIndexes[(plannode)->plan_node_id] = list_length((cxt)->initPruningOutputs); \
+ } while (0)
+
+/*
+ * Finds the PlanInitPruningOutput for a given Plan node in
+ * ExecLockRelsInfo.initPruningOutputs.
+ */
+#define ExecFetchPlanInitPruningOutput(execlockrelsinfo, plannode) \
+ (((execlockrelsinfo) != NULL && (execlockrelsinfo)->initPruningOutputs != NIL) ? \
+ list_nth((execlockrelsinfo)->initPruningOutputs, \
+ (execlockrelsinfo)->ipoIndexes[(plannode)->plan_node_id] - 1) : NULL)
+
+/* ---------------
+ * PlanInitPruningOutput
+ *
+ * Node to remember the result of performing initial partition pruning steps
+ * during ExecutorGetLockRels() on nodes that support pruning.
+ *
+ * ExecLockRelsDoInitPruning(), which runs during ExecutorGetLockRels(),
+ * creates it and stores it in the corresponding ExecLockRelsInfo.
+ *
+ * ExecInitPartitionPruning(), which runs during ExecuorStart(), fetches it
+ * from the EState's ExecLockRelsInfo (if any) and uses the value of
+ * initially_valid_subplans contained in it as-is to select the subplans to be
+ * initialized for execution, instead of re-evaluating that by performing
+ * initial pruning again.
+ */
+typedef struct PlanInitPruningOutput
+{
+ NodeTag type;
+
+ Bitmapset *initially_valid_subplans;
+} PlanInitPruningOutput;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 05f0b79e82..00c4d8293e 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -96,6 +96,11 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_ExecGetLockRelsContext,
+ T_ExecLockRelsInfo,
+ T_PlanInitPruningOutput,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 5327d9ba8b..019719c1a4 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -129,6 +129,10 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
PartitionDirectory partition_directory; /* partition descriptors */
Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index bd87c35d6c..bfdb5bbf28 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -59,10 +59,16 @@ typedef struct PlannedStmt
bool parallelModeNeeded; /* parallel mode required to execute? */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
int jitFlags; /* which forms of JIT should be performed */
struct Plan *planTree; /* tree of Plan nodes */
+ int numPlanNodes; /* number of nodes in planTree */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -1189,6 +1195,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1197,6 +1210,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 92291a750d..bf80c53bed 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -64,7 +64,7 @@ extern PlannedStmt *pg_plan_query(Query *querytree, const char *query_string,
ParamListInfo boundParams);
extern List *pg_plan_queries(List *querytrees, const char *query_string,
int cursorOptions,
- ParamListInfo boundParams);
+ ParamListInfo boundParams, List **execlockrelsinfo_list);
extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
extern void assign_max_stack_depth(int newval, void *extra);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 95b99e3d25..56b0dcc6bd 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -148,6 +148,9 @@ typedef struct CachedPlan
{
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
+ List *execlockrelsinfo_list; /* list of ExecutorGetLockRelsResult with one
+ * element for each of stmt_list; NIL
+ * if not a generic plan */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
@@ -158,6 +161,9 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext execlockrelsinfo_context; /* context containing
+ * execlockrelsinfo_list,
+ * a child of the above context */
} CachedPlan;
/*
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9abace6734 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,10 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *execlockrelsinfos; /* list of ExecutorGetLockRelsResults with one element
+ * for each of 'stmts'; same as
+ * cplan->execlockrelsinfo_list if cplan is
+ * not NULL */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -241,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.24.1
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-03-31 03:25 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 2 replies; 82+ messages in thread
From: Amit Langote @ 2022-03-31 03:25 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Tom Lane <[email protected]>; pgsql-hackers; David Rowley *EXTERN* <[email protected]>
On Mon, Mar 28, 2022 at 4:28 PM Amit Langote <[email protected]> wrote:
> On Mon, Mar 28, 2022 at 4:17 PM Amit Langote <[email protected]> wrote:
> > Other than the changes mentioned above, the updated patch now contains
> > a bit more commentary than earlier versions, mostly around
> > AcquireExecutorLocks()'s new way of determining the set of relations
> > to lock and the significantly redesigned working of the "initial"
> > execution pruning.
>
> Forgot to rebase over the latest HEAD, so here's v7. Also fixed that
> _out and _read functions for PlanInitPruningOutput were using an
> obsolete node label.
Rebased.
--
Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v8-0004-Optimize-AcquireExecutorLocks-to-skip-pruned-part.patch (94.3K, 2-v8-0004-Optimize-AcquireExecutorLocks-to-skip-pruned-part.patch)
download | inline diff:
From 9e0ae8887a9f3d75feb4df969dde504a21d3700d Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v8 4/4] Optimize AcquireExecutorLocks() to skip pruned
partitions
Instead of locking all relations listed in the range table in the
cases where the PlannedStmt indicates that some nodes in the plan
tree can do partition pruning without depending on execution having
started (so called "initial" pruning), AcquireExecutorLocks() now
calls the new executor function ExecutorGetLockRels() which returns
a set of relations (their RT indexes) to be locked not including
those scanned by the subplans that pruned.
The result of pruning done this way must be remembered and reused
during actual execution of the plan, which is done by creating a
PlanInitPruningOutput nodes for for each plan node that undergoes
pruning and a set of those for the whole plan tree are added to
ExecLockRelsInfo which also stores the bitmapset of RT indexes of
relations that are actually locked by AcquireExecutorLocks().
ExecLockRelsInfos are passed down the executor alongside the
PlannedStmts. This arrangement ensures that the executor doesn't
accidentally try to process a plan tree subnodes that has been
deemed pruned by AcquireExecutorLocks().
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 13 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 17 +-
src/backend/executor/README | 24 +++
src/backend/executor/execMain.c | 202 ++++++++++++++++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 224 ++++++++++++++++++----
src/backend/executor/execUtils.c | 8 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 52 ++++-
src/backend/executor/nodeMergeAppend.c | 52 ++++-
src/backend/executor/nodeModifyTable.c | 25 +++
src/backend/executor/spi.c | 14 +-
src/backend/nodes/copyfuncs.c | 49 ++++-
src/backend/nodes/outfuncs.c | 39 ++++
src/backend/nodes/readfuncs.c | 37 ++++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 6 +
src/backend/partitioning/partprune.c | 37 +++-
src/backend/tcop/postgres.c | 15 +-
src/backend/tcop/pquery.c | 21 ++-
src/backend/utils/cache/plancache.c | 252 ++++++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execPartition.h | 2 +
src/include/executor/execdesc.h | 2 +
src/include/executor/executor.h | 2 +
src/include/executor/nodeAppend.h | 1 +
src/include/executor/nodeMergeAppend.h | 1 +
src/include/executor/nodeModifyTable.h | 1 +
src/include/nodes/execnodes.h | 96 ++++++++++
src/include/nodes/nodes.h | 5 +
src/include/nodes/pathnodes.h | 4 +
src/include/nodes/plannodes.h | 15 ++
src/include/tcop/tcopprot.h | 2 +-
src/include/utils/plancache.h | 6 +
src/include/utils/portal.h | 5 +
41 files changed, 1174 insertions(+), 104 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 55c38b04c4..d403eb2309 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -542,7 +542,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index cb13227db1..e5dff2bc25 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, execlockrelsinfo, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1013790dbb..008b8ce0e9 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -741,8 +741,10 @@ execute_sql_string(const char *sql)
RawStmt *parsetree = lfirst_node(RawStmt, lc1);
MemoryContext per_parsetree_context,
oldcontext;
- List *stmt_list;
- ListCell *lc2;
+ List *stmt_list,
+ *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
/*
* We do the work for each parsetree in a short-lived context, to
@@ -762,11 +764,13 @@ execute_sql_string(const char *sql)
NULL,
0,
NULL);
- stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL);
+ stmt_list = pg_plan_queries(stmt_list, sql, CURSOR_OPT_PARALLEL_OK, NULL,
+ &execlockrelsinfo_list);
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
CommandCounterIncrement();
@@ -777,6 +781,7 @@ execute_sql_string(const char *sql)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ execlockrelsinfo,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 05e7b60059..4ef44aaf23 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -416,7 +416,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 9902c5c566..85e73ddded 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -107,6 +107,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL), /* no ExecLockRelsInfo to pass */
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..bbbf8bbcbd 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *plan_execlockrelsinfo_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -195,6 +196,7 @@ ExecuteQuery(ParseState *pstate,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* DO NOT add any logic that could possibly throw an error between
@@ -204,7 +206,7 @@ ExecuteQuery(ParseState *pstate,
NULL,
query_string,
entry->plansource->commandTag,
- plan_list,
+ plan_list, plan_execlockrelsinfo_list,
cplan);
/*
@@ -576,7 +578,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *plan_execlockrelsinfo_list;
+ ListCell *p,
+ *pe;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -632,15 +636,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ plan_execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pe, plan_execlockrelsinfo_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, pe);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, execlockrelsinfo, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..b45ca508a8 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,27 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+proceeds by locking all the relations that will be scanned by that plan. If
+the generic plan has nodes that contain so-called initial pruning steps (a
+subset of execution pruning steps that do not depend on full-fledged execution
+having started), they are performed at this point to figure out the minimal
+set of child subplans that satisfy those pruning instructions and the result
+of performing that pruning is saved in a data structure that gets passed to
+the executor alongside the plan tree. Relations scanned by only those
+surviving subplans are then locked while those scanned by the pruned subplans
+are not, even though the pruned subplans themselves are not removed from the
+plan tree. So, it is imperative that the executor and any third party code
+invoked by it that gets passed the plan tree look at the initial pruning result
+made available via the aforementioned data structure to determine whether or
+not a particular subplan is valid. (The data structure basically consists of
+an array of PlanInitPruningOutput nodes containing one element for each node
+of the plan tree indexable using plan_node_id of the individual plan nodes,
+where each node contains a bitmapset of indexes of unpruned child subplans of
+a given node.)
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +307,9 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorGetLockRels ] --- an optional step to walk over the plan tree
+ to produce an ExecLockRelsInfo to be passed to CreateQueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..56946c12dd 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,11 +49,15 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/nodeAppend.h"
+#include "executor/nodeMergeAppend.h"
+#include "executor/nodeModifyTable.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
#include "mb/pg_wchar.h"
#include "miscadmin.h"
+#include "nodes/nodeFuncs.h"
#include "parser/parsetree.h"
#include "storage/bufmgr.h"
#include "storage/lmgr.h"
@@ -101,9 +105,205 @@ static char *ExecBuildSlotValueDescription(Oid reloid,
Bitmapset *modifiedCols,
int maxfieldlen);
static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
+static bool ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorGetLockRels
+ *
+ * Figure out the minimal set of relations to lock to be able to safely
+ * execute a given plan
+ *
+ * This ignores the relations scanned by child subplans that are pruned away
+ * after performing initial pruning steps present in the plan using the
+ * provided set of EXTERN parameters.
+ *
+ * Along with the set of RT indexes of relations that must be locked, the
+ * returned struct also contains an array of PlanInitPruningOutput nodes each
+ * of which contains the result of initial pruning for a given plan node, which
+ * is basically a bitmapset of the indexes of surviving child subplans. Each
+ * plan node in the tree that undergoes pruning will have an element in the
+ * array.
+ *
+ * Note that while relations scanned by the subplans that are pruned will not
+ * be locked, the subplans themselves are left as-is in the plan tree, assuming
+ * anything that reads the plan tree during execution knows to ignore them by
+ * looking at the PlanInitPruningOutput's list of valid subplans.
+ *
+ * Partitioned tables mentioned in PartitionedRelPruneInfo nodes that drive
+ * the pruning will be locked before doing the pruning and also added to the
+ * the returned set.
+ */
+ExecLockRelsInfo *
+ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ int numPlanNodes = plannedstmt->numPlanNodes;
+ ExecGetLockRelsContext context;
+ ExecLockRelsInfo *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ context.stmt = plannedstmt;
+ context.params = params;
+
+ /*
+ * Go walk all the plan tree(s) present in the PlannedStmt, filling
+ * context.lockrels with only the relations from plan nodes that
+ * survive initial pruning and also the tables mentioned in
+ * partitioned_rels sets found in the plan.
+ */
+ context.lockrels = NULL;
+ context.initPruningOutputs = NIL;
+ context.ipoIndexes = palloc0(sizeof(int) * numPlanNodes);
+
+ /* All the subplans. */
+ foreach(lc, plannedstmt->subplans)
+ {
+ Plan *subplan = lfirst(lc);
+
+ (void) ExecGetLockRels(subplan, &context);
+ }
+
+ /* And the main tree. */
+ (void) ExecGetLockRels(plannedstmt->planTree, &context);
+
+ /*
+ * Also be sure to lock partitioned relations from any [Merge]Append nodes
+ * that were originally present but were ultimately left out from the plan
+ * due to being deemed no-op nodes.
+ */
+ context.lockrels = bms_add_members(context.lockrels,
+ plannedstmt->elidedAppendPartedRels);
+
+ result = makeNode(ExecLockRelsInfo);
+ result->lockrels = context.lockrels;
+ result->numPlanNodes = numPlanNodes;
+ result->initPruningOutputs = context.initPruningOutputs;
+ result->ipoIndexes = context.ipoIndexes;
+
+ return result;
+}
+
+/* ------------------------------------------------------------------------
+ * ExecGetLockRels
+ * Adds all the relations that will be scanned by 'node' and its child
+ * plans to context->lockrels after taking into the account the effect
+ * of performing initial pruning if any
+ *
+ * context->stmt gives the PlannedStmt being inspected to access the plan's
+ * range table if needed and context->params the set of EXTERN parameters
+ * available to evaluate pruning parameters.
+ *
+ * If initial pruning is done, a PlanInitPruningOutput node containing the
+ * result of pruning will be stored in context->initPruningOutputs that will
+ * be made available to the executor to reuse.
+ * ------------------------------------------------------------------------
+ */
+bool
+ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context)
+{
+ /* Do nothing when we get to the end of a leaf on tree. */
+ if (node == NULL)
+ return true;
+
+ /* Make sure there's enough stack available. */
+ check_stack_depth();
+
+ switch (nodeTag(node))
+ {
+ /* Currently, only these two nodes have prunable child subplans. */
+ case T_Append:
+ if (ExecGetAppendLockRels((Append *) node, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (ExecGetMergeAppendLockRels((MergeAppend *) node,
+ context))
+ return true;
+ break;
+
+ /*
+ * And these manipulate relations that must be added context->lockrels.
+ */
+ case T_SeqScan:
+ case T_SampleScan:
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ case T_BitmapIndexScan:
+ case T_BitmapHeapScan:
+ case T_TidScan:
+ case T_TidRangeScan:
+ case T_ForeignScan:
+ case T_SubqueryScan:
+ case T_CustomScan:
+ if (ExecGetScanLockRels((Scan *) node, context))
+ return true;
+ break;
+ case T_ModifyTable:
+ if (ExecGetModifyTableLockRels((ModifyTable *) node, context))
+ return true;
+ /* plan_tree_walker() will visit the subplan (outerNode) */
+ break;
+
+ default:
+ break;
+ }
+
+ /* Recurse to subnodes. */
+ return plan_tree_walker(node, ExecGetLockRels, (void *) context);
+}
+
+/*
+ * ExecGetScanLockRels
+ * Do ExecGetLockRels()'s work for a leaf Scan node
+ */
+static bool
+ExecGetScanLockRels(Scan *scan, ExecGetLockRelsContext *context)
+{
+ switch (nodeTag(scan))
+ {
+ case T_ForeignScan:
+ {
+ ForeignScan *fscan = (ForeignScan *) scan;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ fscan->fs_relids);
+ }
+ break;
+
+ case T_SubqueryScan:
+ {
+ SubqueryScan *sscan = (SubqueryScan *) scan;
+
+ (void) ExecGetLockRels((Plan *) sscan->subplan, context);
+ }
+ break;
+
+ case T_CustomScan:
+ {
+ CustomScan *cscan = (CustomScan *) scan;
+ ListCell *lc;
+
+ context->lockrels = bms_add_members(context->lockrels,
+ cscan->custom_relids);
+ foreach(lc, cscan->custom_plans)
+ {
+ (void) ExecGetLockRels((Plan *) lfirst(lc), context);
+ }
+ }
+ break;
+
+ default:
+ context->lockrels = bms_add_member(context->lockrels,
+ scan->scanrelid);
+ break;
+ }
+
+ return true;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +1006,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ ExecLockRelsInfo *execlockrelsinfo = queryDesc->execlockrelsinfo;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -825,6 +1026,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_execlockrelsinfo = execlockrelsinfo;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 9a0d5d59ef..fb6dbd298a 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_EXECLOCKRELSINFO UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
@@ -596,12 +598,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *execlockrelsinfo_data;
+ char *execlockrelsinfo_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int execlockrelsinfo_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -630,6 +635,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ execlockrelsinfo_data = nodeToString(estate->es_execlockrelsinfo);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -656,6 +662,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized ExecLockRelsInfo. */
+ execlockrelsinfo_len = strlen(execlockrelsinfo_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, execlockrelsinfo_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -750,6 +761,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized ExecLockRelsInfo */
+ execlockrelsinfo_space = shm_toc_allocate(pcxt->toc, execlockrelsinfo_len);
+ memcpy(execlockrelsinfo_space, execlockrelsinfo_data, execlockrelsinfo_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ execlockrelsinfo_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1231,8 +1248,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *execlockrelsinfospace;
char *paramspace;
PlannedStmt *pstmt;
+ ExecLockRelsInfo *execlockrelsinfo;
ParamListInfo paramLI;
char *queryString;
@@ -1243,12 +1262,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied ExecLockRelsInfo. */
+ execlockrelsinfospace = shm_toc_lookup(toc, PARALLEL_KEY_EXECLOCKRELSINFO,
+ false);
+ execlockrelsinfo = (ExecLockRelsInfo *) stringToNode(execlockrelsinfospace);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, execlockrelsinfo,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 84b4e4b3d6..e79ada16f0 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,8 +186,13 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
-static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1588,8 +1594,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or even before during ExecutorGetLockRels().
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1601,10 +1608,17 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* Creates the PartitionPruneState required by each of the two pruning
* functions. Details stored include how to map the partition index
* returned by the partition pruning code into subplan indexes. Also
- * determines the set of initially valid subplans by performing initial
- * pruning steps, only which need be initialized by the caller such as
- * ExecInitAppend. Maps in PartitionPruneState are updated to account
- * for initial pruning having eliminated some of the subplans, if any.
+ * determines the set of initially valid subplans by either looking that
+ * up in the plan node's PlanInitPruningOutput if one found in
+ * EState.es_execlockrelinfo or by performing initial pruning steps.
+ * Only the subplans included in that need be initialized by the caller
+ * such as ExecInitAppend. Maps in PartitionPruneState are updated to
+ * account for initial pruning having eliminated some of the subplans,
+ * if any.
+ *
+ * ExecGetLockRelsDoInitialPruning:
+ * Do initial pruning as part of ExecGetLockRels() on the parent plan
+ * node
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
@@ -1619,9 +1633,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* ExecInitPartitionPruning
* Initialize data structure needed for run-time partition pruning
*
- * Initial pruning can be done immediately, so it is done here if needed and
- * the set of surviving partition subplans' indexes are added to the output
- * parameter *initially_valid_subplans.
+ * Initial pruning can be done immediately, so it is done here unless it has
+ * already been done by ExecGetLockRelsDoInitialPruning(), and the set of
+ * surviving partition subplans' indexes are added to the output parameter
+ * *initially_valid_subplans.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1635,22 +1650,57 @@ ExecInitPartitionPruning(PlanState *planstate,
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ Plan *plan = planstate->plan;
+ PlanInitPruningOutput *initPruningOutput = NULL;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /* Retrieve the parent plan's PlanInitPruningOutput, if any. */
+ if (estate->es_execlockrelsinfo)
+ {
+ initPruningOutput = (PlanInitPruningOutput *)
+ ExecFetchPlanInitPruningOutput(estate->es_execlockrelsinfo, plan);
- /*
- * Create the working data structure for pruning.
- */
- prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+ Assert(initPruningOutput != NULL &&
+ IsA(initPruningOutput, PlanInitPruningOutput));
+ /* No need to do initial pruning again, only exec pruning. */
+ do_pruning = pruneinfo->needs_exec_pruning;
+ }
+
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PlanInitPruningOutput.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo,
+ initPruningOutput == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune, if required.
*/
- if (prunestate->do_initial_prune)
+ if (initPruningOutput)
+ {
+ /* ExecGetLockRelsDoInitialPruning() already did it for us! */
+ *initially_valid_subplans = initPruningOutput->initially_valid_subplans;
+ }
+ else if (prunestate && prunestate->do_initial_prune)
{
/* Determine which subplans survive initial pruning */
- *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate,
+ pruneinfo);
}
else
{
@@ -1668,7 +1718,7 @@ ExecInitPartitionPruning(PlanState *planstate,
* invalid data in prunestate, because that data won't be consulted again
* (cf initial Assert in ExecFindMatchingSubPlans).
*/
- if (prunestate->do_exec_prune &&
+ if (prunestate && prunestate->do_exec_prune &&
bms_num_members(*initially_valid_subplans) < n_total_subplans)
PartitionPruneStateFixSubPlanMap(prunestate,
*initially_valid_subplans,
@@ -1677,12 +1727,75 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecGetLockRelsDoInitialPruning
+ * Perform initial pruning as part of doing ExecGetLockRels() on the parent
+ * plan node
+ */
+Bitmapset *
+ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo)
+{
+ List *rtable = context->stmt->rtable;
+ ParamListInfo params = context->params;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ PlanInitPruningOutput *initPruningOutput;
+
+ /*
+ * A temporary context to allocate stuff needded to run the pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so must create
+ * a standalone ExprContext to evaluate pruning expressions, equipped with
+ * the information about the EXTERN parameters that the caller passed us.
+ * Note that that's okay because the initial pruning steps do not contain
+ * anything that requires the execution to have started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = ExecCreatePartitionPruneState(NULL, pruneinfo,
+ true, false,
+ rtable, econtext,
+ pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the pruning and populate a PlanInitPruningOutput for this node. */
+ initPruningOutput = makeNode(PlanInitPruningOutput);
+ initPruningOutput->initially_valid_subplans =
+ ExecFindInitialMatchingSubPlans(prunestate, pruneinfo);
+ ExecStorePlanInitPruningOutput(context, initPruningOutput, plan);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return initPruningOutput->initially_valid_subplans;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
* ExecFindInitialMatchingSubPlans and ExecFindMatchingSubPlans.
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'partitionpruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1697,19 +1810,20 @@ ExecInitPartitionPruning(PlanState *planstate,
*/
static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo)
+ PartitionPruneInfo *partitionpruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(partitionpruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1760,19 +1874,48 @@ ExecCreatePartitionPruneState(PlanState *planstate,
PartitionedRelPruneInfo *pinfo = lfirst_node(PartitionedRelPruneInfo, lc2);
PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
Relation partrel;
+ bool close_partrel = false;
PartitionDesc partdesc;
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorGetLockRels() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ close_partrel = true;
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (close_partrel)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1874,7 +2017,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1884,7 +2027,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -1998,7 +2141,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
* is required.
*/
static Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
+ PartitionPruneInfo *pruneinfo)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2008,8 +2152,8 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
Assert(prunestate->do_initial_prune);
/*
- * Switch to a temp context to avoid leaking memory in the executor's
- * query-lifespan memory context.
+ * Switch to a temp context to avoid leaking memory in the longer-term
+ * memory context.
*/
oldcontext = MemoryContextSwitchTo(prunestate->prune_context);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..7246f9175f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_execlockrelsinfo = NULL;
estate->es_junkFilter = NULL;
@@ -785,6 +786,13 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
Assert(rti > 0 && rti <= estate->es_range_table_size);
+ /*
+ * A cross-check that AcquireExecutorLocks() hasn't missed any relations
+ * it must not have.
+ */
+ Assert(estate->es_execlockrelsinfo == NULL ||
+ bms_is_member(rti, estate->es_execlockrelsinfo->lockrels));
+
rel = estate->es_relations[rti - 1];
if (rel == NULL)
{
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 5b6d3eb23b..9c6f907687 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -94,6 +94,55 @@ static bool ExecAppendAsyncRequest(AppendState *node, TupleTableSlot **result);
static void ExecAppendAsyncEventWait(AppendState *node);
static void classify_matching_subplans(AppendState *node);
+/* ----------------------------------------------------------------
+ * ExecGetAppendLockRels
+ * Do ExecGetLockRels()'s work for an Append plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ /*
+ * Must always lock all the partitioned tables whose direct and indirect
+ * partitions will be scanned by this Append.
+ */
+ context->lockrels = bms_add_members(context->lockrels,
+ node->partitioned_rels);
+
+ /*
+ * Now recurse to subplans to add relations scanned therein.
+ *
+ * If initial pruning can be done, do that now and only recurse to the
+ * surviving subplans.
+ */
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->appendplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Recurse to surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /* Tell the caller to recurse to *all* the subplans. */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitAppend
*
@@ -155,7 +204,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 9a9f29e845..4b04fcdbc2 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -54,6 +54,55 @@ typedef int32 SlotNumber;
static TupleTableSlot *ExecMergeAppend(PlanState *pstate);
static int heap_compare_slots(Datum a, Datum b, void *arg);
+/* ----------------------------------------------------------------
+ * ExecGetMergeAppendLockRels
+ * Do ExecGetLockRels()'s work for a MergeAppend plan
+ * ----------------------------------------------------------------
+ */
+bool
+ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context)
+{
+ PartitionPruneInfo *pruneinfo = node->part_prune_info;
+
+ /*
+ * Must always lock all the partitioned tables whose direct and indirect
+ * partitions will be scanned by this Append.
+ */
+ context->lockrels = bms_add_members(context->lockrels,
+ node->partitioned_rels);
+
+ /*
+ * Now recurse to subplans to add relations scanned therein.
+ *
+ * If initial pruning can be done, do that now and only recurse to the
+ * surviving subplans.
+ */
+ if (pruneinfo && pruneinfo->needs_init_pruning)
+ {
+ List *subplans = node->mergeplans;
+ Bitmapset *validsubplans;
+ int i;
+
+ validsubplans = ExecGetLockRelsDoInitialPruning((Plan *) node,
+ context, pruneinfo);
+
+ /* Recurse to surviving subplans. */
+ i = -1;
+ while ((i = bms_next_member(validsubplans, i)) >= 0)
+ {
+ Plan *subplan = list_nth(subplans, i);
+
+ (void) ExecGetLockRels(subplan, context);
+ }
+
+ /* done with this node */
+ return true;
+ }
+
+ /* Tell the caller to recurse to *all* the subplans. */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitMergeAppend
@@ -103,7 +152,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 171575cd73..f17bede367 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -3853,6 +3853,31 @@ ExecLookupResultRelByOid(ModifyTableState *node, Oid resultoid,
return NULL;
}
+/*
+ * ExecGetModifyTableLockRels
+ * Do ExecGetLockRels()'s work for a ModifyTable plan
+ */
+bool
+ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context)
+{
+ ListCell *lc;
+
+ /* First add the result relation RTIs mentioned in the node. */
+ if (plan->rootRelation > 0)
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->rootRelation);
+ context->lockrels = bms_add_member(context->lockrels,
+ plan->nominalRelation);
+ foreach(lc, plan->resultRelations)
+ {
+ context->lockrels = bms_add_member(context->lockrels,
+ lfirst_int(lc));
+ }
+
+ /* Tell the caller to recurse to the subplan (outerPlan(plan)). */
+ return false;
+}
+
/* ----------------------------------------------------------------
* ExecInitModifyTable
* ----------------------------------------------------------------
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 042a5f8b0a..64ebbfb31e 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *execlockrelsinfo_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1659,6 +1660,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
/* Replan if needed, and increment plan refcount for portal */
cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
if (!plan->saved)
{
@@ -1670,6 +1672,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
+ execlockrelsinfo_list = copyObject(execlockrelsinfo_list);
MemoryContextSwitchTo(oldcontext);
ReleaseCachedPlan(cplan, NULL);
cplan = NULL; /* portal shouldn't depend on cplan */
@@ -1683,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ execlockrelsinfo_list,
cplan);
/*
@@ -2473,7 +2477,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *execlockrelsinfo_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2552,6 +2558,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ execlockrelsinfo_list = cplan->execlockrelsinfo_list;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2589,9 +2596,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, execlockrelsinfo_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2671,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, execlockrelsinfo,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 29c515d7db..afffabbea0 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -68,6 +68,13 @@
} \
} while (0)
+/* Copy a field that is an array with numElem ints */
+#define COPY_INT_ARRAY(fldname, numElem) \
+ do { \
+ newnode->fldname = (numElem) > 0 ? palloc((numElem) * sizeof(int)) : NULL; \
+ memcpy(newnode->fldname, from->fldname, sizeof(int) * (numElem)); \
+ } while (0)
+
/* Copy a parse location field (for Copy, this is same as scalar case) */
#define COPY_LOCATION_FIELD(fldname) \
(newnode->fldname = from->fldname)
@@ -94,8 +101,10 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(transientPlan);
COPY_SCALAR_FIELD(dependsOnRole);
COPY_SCALAR_FIELD(parallelModeNeeded);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_SCALAR_FIELD(numPlanNodes);
COPY_NODE_FIELD(rtable);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
@@ -1282,6 +1291,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -5373,6 +5384,33 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static ExecLockRelsInfo *
+_copyExecLockRelsInfo(const ExecLockRelsInfo *from)
+{
+ ExecLockRelsInfo *newnode = makeNode(ExecLockRelsInfo);
+
+ COPY_BITMAPSET_FIELD(lockrels);
+ COPY_SCALAR_FIELD(numPlanNodes);
+ COPY_NODE_FIELD(initPruningOutputs);
+ COPY_INT_ARRAY(ipoIndexes, from->numPlanNodes);
+
+ return newnode;
+}
+
+static PlanInitPruningOutput *
+_copyPlanInitPruningOutput(const PlanInitPruningOutput *from)
+{
+ PlanInitPruningOutput *newnode = makeNode(PlanInitPruningOutput);
+
+ COPY_BITMAPSET_FIELD(initially_valid_subplans);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -5427,7 +5465,6 @@ _copyBitString(const BitString *from)
return newnode;
}
-
static ForeignKeyCacheInfo *
_copyForeignKeyCacheInfo(const ForeignKeyCacheInfo *from)
{
@@ -6454,6 +6491,16 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ retval = _copyExecLockRelsInfo(from);
+ break;
+ case T_PlanInitPruningOutput:
+ retval = _copyPlanInitPruningOutput(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 108ede9af9..e2d7e6bcac 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -312,8 +312,10 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(transientPlan);
WRITE_BOOL_FIELD(dependsOnRole);
WRITE_BOOL_FIELD(parallelModeNeeded);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_INT_FIELD(numPlanNodes);
WRITE_NODE_FIELD(rtable);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
@@ -1008,6 +1010,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -2818,6 +2822,31 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outExecLockRelsInfo(StringInfo str, const ExecLockRelsInfo *node)
+{
+ WRITE_NODE_TYPE("EXECLOCKRELSINFO");
+
+ WRITE_BITMAPSET_FIELD(lockrels);
+ WRITE_INT_FIELD(numPlanNodes);
+ WRITE_NODE_FIELD(initPruningOutputs);
+ WRITE_INT_ARRAY(ipoIndexes, node->numPlanNodes);
+}
+
+static void
+_outPlanInitPruningOutput(StringInfo str, const PlanInitPruningOutput *node)
+{
+ WRITE_NODE_TYPE("PLANINITPRUNINGOUTPUT");
+
+ WRITE_BITMAPSET_FIELD(initially_valid_subplans);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4720,6 +4749,16 @@ outNode(StringInfo str, const void *obj)
_outJsonItemCoercions(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_ExecLockRelsInfo:
+ _outExecLockRelsInfo(str, obj);
+ break;
+ case T_PlanInitPruningOutput:
+ _outPlanInitPruningOutput(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index ce146dd45e..88173f70a1 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1782,8 +1782,10 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(transientPlan);
READ_BOOL_FIELD(dependsOnRole);
READ_BOOL_FIELD(parallelModeNeeded);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_INT_FIELD(numPlanNodes);
READ_NODE_FIELD(rtable);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
@@ -2735,6 +2737,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2904,6 +2908,35 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+/*
+ * _readExecLockRelsInfo
+ */
+static ExecLockRelsInfo *
+_readExecLockRelsInfo(void)
+{
+ READ_LOCALS(ExecLockRelsInfo);
+
+ READ_BITMAPSET_FIELD(lockrels);
+ READ_INT_FIELD(numPlanNodes);
+ READ_NODE_FIELD(initPruningOutputs);
+ READ_INT_ARRAY(ipoIndexes, local_node->numPlanNodes);
+
+ READ_DONE();
+}
+
+/*
+ * _readPlanInitPruningOutput
+ */
+static PlanInitPruningOutput *
+_readPlanInitPruningOutput(void)
+{
+ READ_LOCALS(PlanInitPruningOutput);
+
+ READ_BITMAPSET_FIELD(initially_valid_subplans);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -3197,6 +3230,10 @@ parseNodeString(void)
return_value = _readJsonCoercion();
else if (MATCH("JSONITEMCOERCIONS", 17))
return_value = _readJsonItemCoercions();
+ else if (MATCH("EXECLOCKRELSINFO", 16))
+ return_value = _readExecLockRelsInfo();
+ else if (MATCH("PLANINITPRUNINGOUTPUT", 21))
+ return_value = _readPlanInitPruningOutput();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c769b4b4b9..4c586ac1ec 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -517,7 +517,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->transientPlan = glob->transientPlan;
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->planTree = top_plan;
+ result->numPlanNodes = glob->lastPlanNodeId;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 8214edec54..a1c6c3caa2 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1623,6 +1623,9 @@ set_append_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (aplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
@@ -1710,6 +1713,9 @@ set_mergeappend_references(PlannerInfo *root,
pinfo->rtindex += rtoffset;
}
}
+
+ if (mplan->part_prune_info->needs_init_pruning)
+ root->glob->containsInitialPruning = true;
}
/* We don't need to recurse to lefttree or righttree ... */
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 7080cb25d9..3322dc79f2 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -230,6 +232,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -309,12 +313,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -323,6 +331,10 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+ if (!needs_init_pruning)
+ needs_init_pruning = partrel_needs_init_pruning;
+ if (!needs_exec_pruning)
+ needs_exec_pruning = partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -337,6 +349,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -435,13 +449,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -452,6 +471,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -539,6 +562,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * by noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -613,6 +639,11 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ if (!*needs_init_pruning)
+ *needs_init_pruning = (initial_pruning_steps != NIL);
+ if (!*needs_exec_pruning)
+ *needs_exec_pruning = (exec_pruning_steps != NIL);
+
pinfolist = lappend(pinfolist, pinfo);
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ba2fcfeb4a..085eb3f209 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -945,15 +945,17 @@ pg_plan_query(Query *querytree, const char *query_string, int cursorOptions,
* For normal optimizable statements, invoke the planner. For utility
* statements, just make a wrapper PlannedStmt node.
*
- * The result is a list of PlannedStmt nodes.
+ * The result is a list of PlannedStmt nodes. Also, a NULL is appended to
+ * *execlockrelsinfo_list for each PlannedStmt added to the returned list.
*/
List *
pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
- ParamListInfo boundParams)
+ ParamListInfo boundParams, List **execlockrelsinfo_list)
{
List *stmt_list = NIL;
ListCell *query_list;
+ *execlockrelsinfo_list = NIL;
foreach(query_list, querytrees)
{
Query *query = lfirst_node(Query, query_list);
@@ -977,6 +979,7 @@ pg_plan_queries(List *querytrees, const char *query_string, int cursorOptions,
}
stmt_list = lappend(stmt_list, stmt);
+ *execlockrelsinfo_list = lappend(*execlockrelsinfo_list, NULL);
}
return stmt_list;
@@ -1080,7 +1083,8 @@ exec_simple_query(const char *query_string)
QueryCompletion qc;
MemoryContext per_parsetree_context = NULL;
List *querytree_list,
- *plantree_list;
+ *plantree_list,
+ *plantree_execlockrelsinfo_list;
Portal portal;
DestReceiver *receiver;
int16 format;
@@ -1167,7 +1171,8 @@ exec_simple_query(const char *query_string)
NULL, 0, NULL);
plantree_list = pg_plan_queries(querytree_list, query_string,
- CURSOR_OPT_PARALLEL_OK, NULL);
+ CURSOR_OPT_PARALLEL_OK, NULL,
+ &plantree_execlockrelsinfo_list);
/*
* Done with the snapshot used for parsing/planning.
@@ -1203,6 +1208,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ plantree_execlockrelsinfo_list,
NULL);
/*
@@ -1991,6 +1997,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ cplan->execlockrelsinfo_list,
cplan);
/* Done with the snapshot used for parameter I/O and parsing/planning */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..0fd8c65de7 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->execlockrelsinfo = execlockrelsinfo; /* ExecutorGetLockRels() output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +124,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * execlockrelsinfo: ExecutorGetLockRels() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +137,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +149,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, execlockrelsinfo, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -493,6 +497,7 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ linitial_node(ExecLockRelsInfo, portal->execlockrelsinfos),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1193,7 +1198,8 @@ PortalRunMulti(Portal portal,
QueryCompletion *qc)
{
bool active_snapshot_set = false;
- ListCell *stmtlist_item;
+ ListCell *stmtlist_item,
+ *execlockrelsinfolist_item;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1214,9 +1220,12 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
- foreach(stmtlist_item, portal->stmts)
+ forboth(stmtlist_item, portal->stmts,
+ execlockrelsinfolist_item, portal->execlockrelsinfos)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo,
+ execlockrelsinfolist_item);
/*
* If we got a cancel signal in prior command, quit
@@ -1274,7 +1283,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1292,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, execlockrelsinfo,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 4cf6db504f..9f5a40a0a6 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,16 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
+static void CachedPlanSaveExecLockRelsInfos(CachedPlan *plan, List *execlockrelsinfo_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static List *AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams);
+static void ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,9 +792,21 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * If the CachedPlan is valid, this may in some cases call ExecutorGetLockRels
+ * on each PlannedStmt contained in it to determine the set of relations to be
+ * locked by AcquireExecutorLocks(), instead of just scanning its range table,
+ * which is done to prune away any nodes in the tree that need not be executed
+ * based on the result of initial partition pruning. Resulting
+ * ExecLockRelsInfo nodes containing the result of such pruning, allocated in
+ * a child context of the context containing the plan itself, are added into
+ * plan->execlockrelsinfo_list. The previous contents of the list from the
+ * last invocation on the same CachedPlan are deleted, because they would no
+ * longer be valid given the fresh set of parameter values which may be used
+ * as pruning parameters.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams)
{
CachedPlan *plan = plansource->gplan;
@@ -820,13 +834,25 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *execlockrelsinfo_list;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. If ExecutorGetLockRels() asked
+ * to omit some relations because the plan nodes that scan them were
+ * found to be pruned, the executor will be informed of the omission of
+ * the plan nodes themselves, so that it doesn't accidentally try to
+ * execute those nodes, via the ExecLockRelsInfo nodes collected in the
+ * returned list that is also passed to it along with the list of
+ * PlannedStmts.
+ */
+ execlockrelsinfo_list = AcquireExecutorLocks(plan->stmt_list,
+ boundParams);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -844,11 +870,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
if (plan->is_valid)
{
/* Successfully revalidated and locked the query. */
+
+ /* Remember ExecLockRelsInfos in the CachedPlan. */
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
return true;
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, execlockrelsinfo_list);
}
/*
@@ -880,7 +909,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv)
{
CachedPlan *plan;
- List *plist;
+ List *plist,
+ *execlockrelsinfo_list;
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
@@ -933,7 +963,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* Generate the plan.
*/
plist = pg_plan_queries(qlist, plansource->query_string,
- plansource->cursor_options, boundParams);
+ plansource->cursor_options, boundParams,
+ &execlockrelsinfo_list);
/* Release snapshot if we got one */
if (snapshot_set)
@@ -1002,6 +1033,16 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->is_saved = false;
plan->is_valid = true;
+ /*
+ * Save the dummy ExecLockRelsInfo list, that is a list containing NULLs
+ * as elements. We must do this, becasue users of the CachedPlan expect
+ * one to go with the list of PlannedStmts.
+ * XXX maybe get rid of that contract.
+ */
+ plan->execlockrelsinfo_context = NULL;
+ CachedPlanSaveExecLockRelsInfos(plan, execlockrelsinfo_list);
+ Assert(MemoryContextIsValid(plan->execlockrelsinfo_context));
+
/* assign generation number to new plan */
plan->generation = ++(plansource->generation);
@@ -1160,7 +1201,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1586,6 +1627,49 @@ CopyCachedPlan(CachedPlanSource *plansource)
return newsource;
}
+/*
+ * CachedPlanSaveExecLockRelsInfos
+ * Save the list containing ExecLockRelsInfo nodes into the given
+ * CachedPlan
+ *
+ * The provided list is copied into a dedicated context that is a child of
+ * plan->context. If the child context already exists, it is emptied, because
+ * any ExecLockRelsInfo contained therein would no longer be useful.
+ */
+static void
+CachedPlanSaveExecLockRelsInfos(CachedPlan *plan, List *execlockrelsinfo_list)
+{
+ MemoryContext execlockrelsinfo_context = plan->execlockrelsinfo_context,
+ oldcontext = CurrentMemoryContext;
+ List *execlockrelsinfo_list_copy;
+
+ /*
+ * Set up the dedicated context if not already done, saving it as a child
+ * of the CachedPlan's context.
+ */
+ if (execlockrelsinfo_context == NULL)
+ {
+ execlockrelsinfo_context = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlan execlockrelsinfo list",
+ ALLOCSET_START_SMALL_SIZES);
+ MemoryContextSetParent(execlockrelsinfo_context, plan->context);
+ MemoryContextSetIdentifier(execlockrelsinfo_context, plan->context->ident);
+ plan->execlockrelsinfo_context = execlockrelsinfo_context;
+ }
+ else
+ {
+ /* Just clear existing contents by resetting the context. */
+ Assert(MemoryContextIsValid(execlockrelsinfo_context));
+ MemoryContextReset(execlockrelsinfo_context);
+ }
+
+ MemoryContextSwitchTo(execlockrelsinfo_context);
+ execlockrelsinfo_list_copy = copyObject(execlockrelsinfo_list);
+ MemoryContextSwitchTo(oldcontext);
+
+ plan->execlockrelsinfo_list = execlockrelsinfo_list_copy;
+}
+
/*
* CachedPlanIsValid: test whether the rewritten querytree within a
* CachedPlanSource is currently valid (that is, not marked as being in need
@@ -1737,17 +1821,21 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * Returns a list of ExecLockRelsInfo nodes containing one element for each
+ * PlannedStmt in stmt_list or NULL if the latter is utility statement or its
+ * containsInitialPruning is false.
*/
-static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+static List *
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams)
{
ListCell *lc1;
+ List *execlockrelsinfo_list = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ ExecLockRelsInfo *execlockrelsinfo = NULL;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,27 +1849,139 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
- continue;
+ ScanQueryForLocks(query, true);
}
-
- foreach(lc2, plannedstmt->rtable)
+ else
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (!plannedstmt->containsInitialPruning)
+ {
+ /*
+ * If the plan contains no initial pruning steps, just lock
+ * all the relations found in the range table.
+ */
+ ListCell *lc;
- if (rte->rtekind != RTE_RELATION)
- continue;
+ foreach(lc, plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst(lc);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ /*
+ * Acquire the appropriate type of lock on each relation
+ * OID. Note that we don't actually try to open the rel,
+ * and hence will not fail if it's been dropped entirely
+ * --- we'll just transiently acquire a non-conflicting
+ * lock.
+ */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ else
+ {
+ int rti;
+ Bitmapset *lockrels;
+
+ /*
+ * Walk the plan tree to find only the minimal set of
+ * relations to be locked, considering the effect of performing
+ * initial partition pruning.
+ */
+ execlockrelsinfo = ExecutorGetLockRels(plannedstmt, boundParams);
+ lockrels = execlockrelsinfo->lockrels;
+
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment above. */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ }
+
+ /*
+ * Remember ExecLockRelsInfo for later adding to the QueryDesc that
+ * will be passed to the executor when executing this plan. May be
+ * NULL, but must keep the list the same length as stmt_list.
+ */
+ execlockrelsinfo_list = lappend(execlockrelsinfo_list,
+ execlockrelsinfo);
+ }
+
+ return execlockrelsinfo_list;
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *execlockrelsinfo_list)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, execlockrelsinfo_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecLockRelsInfo *execlockrelsinfo = lfirst_node(ExecLockRelsInfo, lc2);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
/*
- * Acquire the appropriate type of lock on each relation OID. Note
- * that we don't actually try to open the rel, and hence will not
- * fail if it's been dropped entirely --- we'll just transiently
- * acquire a non-conflicting lock.
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ }
+ else
+ {
+ if (execlockrelsinfo == NULL)
+ {
+ ListCell *lc;
+
+ foreach(lc, plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst(lc);
+
+ if (rte->rtekind != RTE_RELATION)
+ continue;
+
+ LockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ {
+ int rti;
+ Bitmapset *lockrels;
+
+ lockrels = execlockrelsinfo->lockrels;
+ rti = -1;
+ while ((rti = bms_next_member(lockrels, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..896f51be08 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -285,6 +285,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan)
{
AssertArg(PortalIsValid(portal));
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->execlockrelsinfos = execlockrelsinfos;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..fef75ba147 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecLockRelsInfo *execlockrelsinfo,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index fd5735a946..ded19b8cbb 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,4 +124,6 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
PartitionPruneInfo *pruneinfo,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
+extern Bitmapset *ExecGetLockRelsDoInitialPruning(Plan *plan, ExecGetLockRelsContext *context,
+ PartitionPruneInfo *pruneinfo);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..4338463479 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecLockRelsInfo *execlockrelsinfo; /* ExecutorGetLockRels()'s output given plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +58,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecLockRelsInfo *execlockrelsinfo,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 873772f188..d03bd5a026 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern ExecLockRelsInfo *ExecutorGetLockRels(PlannedStmt *plannedstmt, ParamListInfo params);
+extern bool ExecGetLockRels(Plan *node, ExecGetLockRelsContext *context);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/executor/nodeAppend.h b/src/include/executor/nodeAppend.h
index 4cb78ee5b6..b53535c2a4 100644
--- a/src/include/executor/nodeAppend.h
+++ b/src/include/executor/nodeAppend.h
@@ -17,6 +17,7 @@
#include "access/parallel.h"
#include "nodes/execnodes.h"
+extern bool ExecGetAppendLockRels(Append *node, ExecGetLockRelsContext *context);
extern AppendState *ExecInitAppend(Append *node, EState *estate, int eflags);
extern void ExecEndAppend(AppendState *node);
extern void ExecReScanAppend(AppendState *node);
diff --git a/src/include/executor/nodeMergeAppend.h b/src/include/executor/nodeMergeAppend.h
index 97fe3b0665..8eb4e9df93 100644
--- a/src/include/executor/nodeMergeAppend.h
+++ b/src/include/executor/nodeMergeAppend.h
@@ -16,6 +16,7 @@
#include "nodes/execnodes.h"
+extern bool ExecGetMergeAppendLockRels(MergeAppend *node, ExecGetLockRelsContext *context);
extern MergeAppendState *ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags);
extern void ExecEndMergeAppend(MergeAppendState *node);
extern void ExecReScanMergeAppend(MergeAppendState *node);
diff --git a/src/include/executor/nodeModifyTable.h b/src/include/executor/nodeModifyTable.h
index c318681b9a..287baf6257 100644
--- a/src/include/executor/nodeModifyTable.h
+++ b/src/include/executor/nodeModifyTable.h
@@ -19,6 +19,7 @@ extern void ExecComputeStoredGenerated(ResultRelInfo *resultRelInfo,
EState *estate, TupleTableSlot *slot,
CmdType cmdtype);
+extern bool ExecGetModifyTableLockRels(ModifyTable *plan, ExecGetLockRelsContext *context);
extern ModifyTableState *ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags);
extern void ExecEndModifyTable(ModifyTableState *node);
extern void ExecReScanModifyTable(ModifyTableState *node);
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index cbbcff81d2..ee0c73e9a4 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -596,6 +596,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ struct ExecLockRelsInfo *es_execlockrelsinfo; /* QueryDesc.execlockrelsinfo */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
@@ -984,6 +985,101 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * ExecLockRelsInfo
+ *
+ * Result of performing ExecutorGetLockRels() for a given PlannedStmt
+ */
+typedef struct ExecLockRelsInfo
+{
+ NodeTag type;
+
+ /*
+ * Relations that must be locked to execute the plan tree contained in
+ * the PlannedStmt.
+ */
+ Bitmapset *lockrels;
+
+ /* PlannedStmt.numPlanNodes */
+ int numPlanNodes;
+
+ /*
+ * List of PlanInitPruningOutput, each representing the output of
+ * performing initial pruning on a given plan node, for all nodes in the
+ * plan tree that have been marked as needing initial pruning.
+ *
+ * 'ipoIndexes' is an array of 'numPlanNodes' elements, indexed with
+ * plan_node_id of the individual nodes in the plan tree, each a 1-based
+ * index into 'initPruningOutputs' list for a given plan node. 0 means
+ * that a given plan node has no entry in the list because of not needing
+ * any initial pruning done on it.
+ */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecLockRelsInfo;
+
+/*----------------
+ * ExecGetLockRelsContext
+ *
+ * Information pertaining to ExecutorGetLockRels() invocation for a given
+ * plan.
+ */
+typedef struct ExecGetLockRelsContext
+{
+ NodeTag type;
+
+ PlannedStmt *stmt; /* target plan */
+ ParamListInfo params; /* EXTERN parameters available for pruning */
+
+ /* Output parameters for ExecGetLockRels and its subroutines. */
+ Bitmapset *lockrels;
+
+ /* See the omment in the definition of ExecLockRelsInfo struct. */
+ List *initPruningOutputs;
+ int *ipoIndexes;
+} ExecGetLockRelsContext;
+
+/*
+ * Appends the provided PlanInitPruningOutput to
+ * ExecGetLockRelsContext.initPruningOutput
+ */
+#define ExecStorePlanInitPruningOutput(cxt, initPruningOutput, plannode) \
+ do { \
+ (cxt)->initPruningOutputs = lappend((cxt)->initPruningOutputs, initPruningOutput); \
+ (cxt)->ipoIndexes[(plannode)->plan_node_id] = list_length((cxt)->initPruningOutputs); \
+ } while (0)
+
+/*
+ * Finds the PlanInitPruningOutput for a given Plan node in
+ * ExecLockRelsInfo.initPruningOutputs.
+ */
+#define ExecFetchPlanInitPruningOutput(execlockrelsinfo, plannode) \
+ (((execlockrelsinfo) != NULL && (execlockrelsinfo)->initPruningOutputs != NIL) ? \
+ list_nth((execlockrelsinfo)->initPruningOutputs, \
+ (execlockrelsinfo)->ipoIndexes[(plannode)->plan_node_id] - 1) : NULL)
+
+/* ---------------
+ * PlanInitPruningOutput
+ *
+ * Node to remember the result of performing initial partition pruning steps
+ * during ExecutorGetLockRels() on nodes that support pruning.
+ *
+ * ExecLockRelsDoInitPruning(), which runs during ExecutorGetLockRels(),
+ * creates it and stores it in the corresponding ExecLockRelsInfo.
+ *
+ * ExecInitPartitionPruning(), which runs during ExecuorStart(), fetches it
+ * from the EState's ExecLockRelsInfo (if any) and uses the value of
+ * initially_valid_subplans contained in it as-is to select the subplans to be
+ * initialized for execution, instead of re-evaluating that by performing
+ * initial pruning again.
+ */
+typedef struct PlanInitPruningOutput
+{
+ NodeTag type;
+
+ Bitmapset *initially_valid_subplans;
+} PlanInitPruningOutput;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 53f6b05a3f..928a30c7c6 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -97,6 +97,11 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_ExecGetLockRelsContext,
+ T_ExecLockRelsInfo,
+ T_PlanInitPruningOutput,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index ef9b54739a..0ed171d3f5 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -129,6 +129,10 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
PartitionDirectory partition_directory; /* partition descriptors */
Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index a823c7c20d..4fcba0e55c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -60,10 +60,16 @@ typedef struct PlannedStmt
bool parallelModeNeeded; /* parallel mode required to execute? */
+ bool containsInitialPruning; /* Do some Plan nodes in the tree
+ * have initial (pre-exec) pruning
+ * steps? */
+
int jitFlags; /* which forms of JIT should be performed */
struct Plan *planTree; /* tree of Plan nodes */
+ int numPlanNodes; /* number of nodes in planTree */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -1192,6 +1198,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1200,6 +1213,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
diff --git a/src/include/tcop/tcopprot.h b/src/include/tcop/tcopprot.h
index 92291a750d..bf80c53bed 100644
--- a/src/include/tcop/tcopprot.h
+++ b/src/include/tcop/tcopprot.h
@@ -64,7 +64,7 @@ extern PlannedStmt *pg_plan_query(Query *querytree, const char *query_string,
ParamListInfo boundParams);
extern List *pg_plan_queries(List *querytrees, const char *query_string,
int cursorOptions,
- ParamListInfo boundParams);
+ ParamListInfo boundParams, List **execlockrelsinfo_list);
extern bool check_max_stack_depth(int *newval, void **extra, GucSource source);
extern void assign_max_stack_depth(int newval, void *extra);
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 95b99e3d25..56b0dcc6bd 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -148,6 +148,9 @@ typedef struct CachedPlan
{
int magic; /* should equal CACHEDPLAN_MAGIC */
List *stmt_list; /* list of PlannedStmts */
+ List *execlockrelsinfo_list; /* list of ExecutorGetLockRelsResult with one
+ * element for each of stmt_list; NIL
+ * if not a generic plan */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
bool is_valid; /* is the stmt_list currently valid? */
@@ -158,6 +161,9 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
+ MemoryContext execlockrelsinfo_context; /* context containing
+ * execlockrelsinfo_list,
+ * a child of the above context */
} CachedPlan;
/*
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9abace6734 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,10 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *execlockrelsinfos; /* list of ExecutorGetLockRelsResults with one element
+ * for each of 'stmts'; same as
+ * cplan->execlockrelsinfo_list if cplan is
+ * not NULL */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -241,6 +245,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *execlockrelsinfos,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.24.1
[application/octet-stream] v8-0001-Some-refactoring-of-runtime-pruning-code.patch (26.5K, 3-v8-0001-Some-refactoring-of-runtime-pruning-code.patch)
download | inline diff:
From ce2041b254a7fee3097012f11685b635d58fb9b2 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 2 Mar 2022 15:17:55 +0900
Subject: [PATCH v8 1/4] Some refactoring of runtime pruning code
This does two things mainly:
* Move the execution pruning initialization steps that are common
between both ExecInitAppend() and ExecInitMergeAppend() into a new
function ExecInitPartitionPruning() defined in execPartition.c.
Thus, ExecCreatePartitionPruneState() and
ExecFindInitialMatchingSubPlans() need not be exported.
* Add an ExprContext field to PartitionPruneContext to remove the
implicit assumption in the runtime pruning code that the ExprContext
to use to compute pruning expressions that need one can always rely
on the PlanState providing it. A future patch will allow runtime
pruning (at least the initial pruning steps) to be performed without
the corresponding PlanState yet having been created, so this will
help.
---
src/backend/executor/execPartition.c | 340 ++++++++++++++++---------
src/backend/executor/nodeAppend.c | 33 +--
src/backend/executor/nodeMergeAppend.c | 32 +--
src/backend/partitioning/partprune.c | 20 +-
src/include/executor/execPartition.h | 9 +-
src/include/partitioning/partprune.h | 2 +
6 files changed, 252 insertions(+), 184 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index aca42ca5b8..84b4e4b3d6 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -184,11 +184,18 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
int maxfieldlen);
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
+static PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *partitionpruneinfo);
+static Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate);
static void ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate);
+ PlanState *planstate,
+ ExprContext *econtext);
+static void PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans);
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
@@ -1590,30 +1597,86 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
- * ExecCreatePartitionPruneState:
+ * ExecInitPartitionPruning:
* Creates the PartitionPruneState required by each of the two pruning
* functions. Details stored include how to map the partition index
- * returned by the partition pruning code into subplan indexes.
- *
- * ExecFindInitialMatchingSubPlans:
- * Returns indexes of matching subplans. Partition pruning is attempted
- * without any evaluation of expressions containing PARAM_EXEC Params.
- * This function must be called during executor startup for the parent
- * plan before the subplans themselves are initialized. Subplans which
- * are found not to match by this function must be removed from the
- * plan's list of subplans during execution, as this function performs a
- * remap of the partition index to subplan index map and the newly
- * created map provides indexes only for subplans which remain after
- * calling this function.
+ * returned by the partition pruning code into subplan indexes. Also
+ * determines the set of initially valid subplans by performing initial
+ * pruning steps, only which need be initialized by the caller such as
+ * ExecInitAppend. Maps in PartitionPruneState are updated to account
+ * for initial pruning having eliminated some of the subplans, if any.
*
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating all available
- * expressions. This function can only be called during execution and
- * must be called again each time the value of a Param listed in
- * PartitionPruneState's 'execparamids' changes.
+ * expressions, that is, using execution pruning steps. This function can
+ * can only be called during execution and must be called again each time
+ * the value of a Param listed in PartitionPruneState's 'execparamids'
+ * changes.
*-------------------------------------------------------------------------
*/
+/*
+ * ExecInitPartitionPruning
+ * Initialize data structure needed for run-time partition pruning
+ *
+ * Initial pruning can be done immediately, so it is done here if needed and
+ * the set of surviving partition subplans' indexes are added to the output
+ * parameter *initially_valid_subplans.
+ *
+ * If subplans are indeed pruned, subplan_map arrays contained in the returned
+ * PartitionPruneState are re-sequenced to not count those, though only if the
+ * maps will be needed for subsequent execution pruning passes.
+ */
+PartitionPruneState *
+ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans)
+{
+ PartitionPruneState *prunestate;
+ EState *estate = planstate->state;
+
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /*
+ * Create the working data structure for pruning.
+ */
+ prunestate = ExecCreatePartitionPruneState(planstate, pruneinfo);
+
+ /*
+ * Perform an initial partition prune, if required.
+ */
+ if (prunestate->do_initial_prune)
+ {
+ /* Determine which subplans survive initial pruning */
+ *initially_valid_subplans = ExecFindInitialMatchingSubPlans(prunestate);
+ }
+ else
+ {
+ /* We'll need to initialize all subplans */
+ Assert(n_total_subplans > 0);
+ *initially_valid_subplans = bms_add_range(NULL, 0,
+ n_total_subplans - 1);
+ }
+
+ /*
+ * Re-sequence subplan indexes contained in prunestate to account for any
+ * that were removed above due to initial pruning.
+ *
+ * We can safely skip this when !do_exec_prune, even though that leaves
+ * invalid data in prunestate, because that data won't be consulted again
+ * (cf initial Assert in ExecFindMatchingSubPlans).
+ */
+ if (prunestate->do_exec_prune &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ PartitionPruneStateFixSubPlanMap(prunestate,
+ *initially_valid_subplans,
+ n_total_subplans);
+
+ return prunestate;
+}
+
/*
* ExecCreatePartitionPruneState
* Build the data structure required for calling
@@ -1632,7 +1695,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* re-used each time we re-evaluate which partitions match the pruning steps
* provided in each PartitionedRelPruneInfo.
*/
-PartitionPruneState *
+static PartitionPruneState *
ExecCreatePartitionPruneState(PlanState *planstate,
PartitionPruneInfo *partitionpruneinfo)
{
@@ -1641,6 +1704,7 @@ ExecCreatePartitionPruneState(PlanState *planstate,
int n_part_hierarchies;
ListCell *lc;
int i;
+ ExprContext *econtext = planstate->ps_ExprContext;
/* For data reading, executor always omits detached partitions */
if (estate->es_partition_directory == NULL)
@@ -1814,7 +1878,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether initial pruning is needed at any level */
prunestate->do_initial_prune = true;
}
@@ -1823,7 +1888,8 @@ ExecCreatePartitionPruneState(PlanState *planstate,
{
ExecInitPruningContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
- partdesc, partkey, planstate);
+ partdesc, partkey, planstate,
+ econtext);
/* Record whether exec pruning is needed at any level */
prunestate->do_exec_prune = true;
}
@@ -1851,7 +1917,8 @@ ExecInitPruningContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
PartitionKey partkey,
- PlanState *planstate)
+ PlanState *planstate,
+ ExprContext *econtext)
{
int n_steps;
int partnatts;
@@ -1872,6 +1939,7 @@ ExecInitPruningContext(PartitionPruneContext *context,
context->ppccontext = CurrentMemoryContext;
context->planstate = planstate;
+ context->exprcontext = econtext;
/* Initialize expression state for each expression we need */
context->exprstates = (ExprState **)
@@ -1900,8 +1968,20 @@ ExecInitPruningContext(PartitionPruneContext *context,
step->step.step_id,
keyno);
- context->exprstates[stateidx] =
- ExecInitExpr(expr, context->planstate);
+ /*
+ * When planstate is NULL, pruning_steps is known not to
+ * contain any expressions that depend on the parent plan.
+ * Information of any available EXTERN parameters must be
+ * passed explicitly in that case, which the caller must
+ * have made available via econtext.
+ */
+ if (planstate == NULL)
+ context->exprstates[stateidx] =
+ ExecInitExprWithParams(expr,
+ econtext->ecxt_param_list_info);
+ else
+ context->exprstates[stateidx] =
+ ExecInitExpr(expr, context->planstate);
}
keyno++;
}
@@ -1914,18 +1994,11 @@ ExecInitPruningContext(PartitionPruneContext *context,
* pruning, disregarding any pruning constraints involving PARAM_EXEC
* Params.
*
- * If additional pruning passes will be required (because of PARAM_EXEC
- * Params), we must also update the translation data that allows conversion
- * of partition indexes into subplan indexes to account for the unneeded
- * subplans having been removed.
- *
* Must only be called once per 'prunestate', and only if initial pruning
* is required.
- *
- * 'nsubplans' must be passed as the total number of unpruned subplans.
*/
-Bitmapset *
-ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
+static Bitmapset *
+ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -1950,14 +2023,20 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
PartitionedRelPruningData *pprune;
prunedata = prunestate->partprunedata[i];
+
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
pprune = &prunedata->partrelprunedata[0];
/* Perform pruning without using PARAM_EXEC Params */
find_matching_subplans_recurse(prunedata, pprune, true, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->initial_pruning_steps)
- ResetExprContext(pprune->initial_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->initial_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
@@ -1970,118 +2049,120 @@ ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate, int nsubplans)
MemoryContextReset(prunestate->prune_context);
+ return result;
+}
+
+/*
+ * PartitionPruneStateFixSubPlanMap
+ * Fix mapping of partition indexes to subplan indexes contained in
+ * prunestate by considering the new list of subplans that survived
+ * initial pruning
+ *
+ * Subplans would previously be indexed 0..(n_total_subplans - 1) should be
+ * changed to index range 0..num(initially_valid_subplans).
+ */
+static void
+PartitionPruneStateFixSubPlanMap(PartitionPruneState *prunestate,
+ Bitmapset *initially_valid_subplans,
+ int n_total_subplans)
+{
+ int *new_subplan_indexes;
+ Bitmapset *new_other_subplans;
+ int i;
+ int newidx;
+
/*
- * If exec-time pruning is required and we pruned subplans above, then we
- * must re-sequence the subplan indexes so that ExecFindMatchingSubPlans
- * properly returns the indexes from the subplans which will remain after
- * execution of this function.
- *
- * We can safely skip this when !do_exec_prune, even though that leaves
- * invalid data in prunestate, because that data won't be consulted again
- * (cf initial Assert in ExecFindMatchingSubPlans).
+ * First we must build a temporary array which maps old subplan
+ * indexes to new ones. For convenience of initialization, we use
+ * 1-based indexes in this array and leave pruned items as 0.
*/
- if (prunestate->do_exec_prune && bms_num_members(result) < nsubplans)
+ new_subplan_indexes = (int *) palloc0(sizeof(int) * n_total_subplans);
+ newidx = 1;
+ i = -1;
+ while ((i = bms_next_member(initially_valid_subplans, i)) >= 0)
{
- int *new_subplan_indexes;
- Bitmapset *new_other_subplans;
- int i;
- int newidx;
+ Assert(i < n_total_subplans);
+ new_subplan_indexes[i] = newidx++;
+ }
- /*
- * First we must build a temporary array which maps old subplan
- * indexes to new ones. For convenience of initialization, we use
- * 1-based indexes in this array and leave pruned items as 0.
- */
- new_subplan_indexes = (int *) palloc0(sizeof(int) * nsubplans);
- newidx = 1;
- i = -1;
- while ((i = bms_next_member(result, i)) >= 0)
- {
- Assert(i < nsubplans);
- new_subplan_indexes[i] = newidx++;
- }
+ /*
+ * Now we can update each PartitionedRelPruneInfo's subplan_map with
+ * new subplan indexes. We must also recompute its present_parts
+ * bitmap.
+ */
+ for (i = 0; i < prunestate->num_partprunedata; i++)
+ {
+ PartitionPruningData *prunedata = prunestate->partprunedata[i];
+ int j;
/*
- * Now we can update each PartitionedRelPruneInfo's subplan_map with
- * new subplan indexes. We must also recompute its present_parts
- * bitmap.
+ * Within each hierarchy, we perform this loop in back-to-front
+ * order so that we determine present_parts for the lowest-level
+ * partitioned tables first. This way we can tell whether a
+ * sub-partitioned table's partitions were entirely pruned so we
+ * can exclude it from the current level's present_parts.
*/
- for (i = 0; i < prunestate->num_partprunedata; i++)
+ for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
{
- PartitionPruningData *prunedata = prunestate->partprunedata[i];
- int j;
+ PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
+ int nparts = pprune->nparts;
+ int k;
- /*
- * Within each hierarchy, we perform this loop in back-to-front
- * order so that we determine present_parts for the lowest-level
- * partitioned tables first. This way we can tell whether a
- * sub-partitioned table's partitions were entirely pruned so we
- * can exclude it from the current level's present_parts.
- */
- for (j = prunedata->num_partrelprunedata - 1; j >= 0; j--)
- {
- PartitionedRelPruningData *pprune = &prunedata->partrelprunedata[j];
- int nparts = pprune->nparts;
- int k;
+ /* We just rebuild present_parts from scratch */
+ bms_free(pprune->present_parts);
+ pprune->present_parts = NULL;
- /* We just rebuild present_parts from scratch */
- bms_free(pprune->present_parts);
- pprune->present_parts = NULL;
+ for (k = 0; k < nparts; k++)
+ {
+ int oldidx = pprune->subplan_map[k];
+ int subidx;
- for (k = 0; k < nparts; k++)
+ /*
+ * If this partition existed as a subplan then change the
+ * old subplan index to the new subplan index. The new
+ * index may become -1 if the partition was pruned above,
+ * or it may just come earlier in the subplan list due to
+ * some subplans being removed earlier in the list. If
+ * it's a subpartition, add it to present_parts unless
+ * it's entirely pruned.
+ */
+ if (oldidx >= 0)
{
- int oldidx = pprune->subplan_map[k];
- int subidx;
-
- /*
- * If this partition existed as a subplan then change the
- * old subplan index to the new subplan index. The new
- * index may become -1 if the partition was pruned above,
- * or it may just come earlier in the subplan list due to
- * some subplans being removed earlier in the list. If
- * it's a subpartition, add it to present_parts unless
- * it's entirely pruned.
- */
- if (oldidx >= 0)
- {
- Assert(oldidx < nsubplans);
- pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
+ Assert(oldidx < n_total_subplans);
+ pprune->subplan_map[k] = new_subplan_indexes[oldidx] - 1;
- if (new_subplan_indexes[oldidx] > 0)
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
- else if ((subidx = pprune->subpart_map[k]) >= 0)
- {
- PartitionedRelPruningData *subprune;
+ if (new_subplan_indexes[oldidx] > 0)
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
+ }
+ else if ((subidx = pprune->subpart_map[k]) >= 0)
+ {
+ PartitionedRelPruningData *subprune;
- subprune = &prunedata->partrelprunedata[subidx];
+ subprune = &prunedata->partrelprunedata[subidx];
- if (!bms_is_empty(subprune->present_parts))
- pprune->present_parts =
- bms_add_member(pprune->present_parts, k);
- }
+ if (!bms_is_empty(subprune->present_parts))
+ pprune->present_parts =
+ bms_add_member(pprune->present_parts, k);
}
}
}
+ }
- /*
- * We must also recompute the other_subplans set, since indexes in it
- * may change.
- */
- new_other_subplans = NULL;
- i = -1;
- while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
- new_other_subplans = bms_add_member(new_other_subplans,
- new_subplan_indexes[i] - 1);
-
- bms_free(prunestate->other_subplans);
- prunestate->other_subplans = new_other_subplans;
+ /*
+ * We must also recompute the other_subplans set, since indexes in it
+ * may change.
+ */
+ new_other_subplans = NULL;
+ i = -1;
+ while ((i = bms_next_member(prunestate->other_subplans, i)) >= 0)
+ new_other_subplans = bms_add_member(new_other_subplans,
+ new_subplan_indexes[i] - 1);
- pfree(new_subplan_indexes);
- }
+ bms_free(prunestate->other_subplans);
+ prunestate->other_subplans = new_other_subplans;
- return result;
+ pfree(new_subplan_indexes);
}
/*
@@ -2123,11 +2204,16 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate)
prunedata = prunestate->partprunedata[i];
pprune = &prunedata->partrelprunedata[0];
+ /*
+ * We pass the 1st item belonging to the root table of the hierarchy
+ * and find_matching_subplans_recurse() takes care of recursing to
+ * other (lower-level) parents as needed.
+ */
find_matching_subplans_recurse(prunedata, pprune, false, &result);
- /* Expression eval may have used space in node's ps_ExprContext too */
+ /* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
- ResetExprContext(pprune->exec_context.planstate->ps_ExprContext);
+ ResetExprContext(pprune->exec_context.exprcontext);
}
/* Add in any subplans that partition pruning didn't account for */
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 7937f1c88f..5b6d3eb23b 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -138,30 +138,17 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &appendstate->ps);
-
- /* Create the working data structure for pruning. */
- prunestate = ExecCreatePartitionPruneState(&appendstate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&appendstate->ps,
+ list_length(node->appendplans),
+ node->part_prune_info,
+ &validsubplans);
appendstate->as_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->appendplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->appendplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 418f89dea8..9a9f29e845 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -86,29 +86,17 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
{
PartitionPruneState *prunestate;
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, &mergestate->ps);
-
- prunestate = ExecCreatePartitionPruneState(&mergestate->ps,
- node->part_prune_info);
+ /*
+ * Set up pruning data structure. Initial pruning steps, if any, are
+ * performed as part of the setup, adding the set of indexes of
+ * surviving subplans to 'validsubplans'.
+ */
+ prunestate = ExecInitPartitionPruning(&mergestate->ps,
+ list_length(node->mergeplans),
+ node->part_prune_info,
+ &validsubplans);
mergestate->ms_prune_state = prunestate;
-
- /* Perform an initial partition prune, if required. */
- if (prunestate->do_initial_prune)
- {
- /* Determine which subplans survive initial pruning */
- validsubplans = ExecFindInitialMatchingSubPlans(prunestate,
- list_length(node->mergeplans));
-
- nplans = bms_num_members(validsubplans);
- }
- else
- {
- /* We'll need to initialize all subplans */
- nplans = list_length(node->mergeplans);
- Assert(nplans > 0);
- validsubplans = bms_add_range(NULL, 0, nplans - 1);
- }
+ nplans = bms_num_members(validsubplans);
/*
* When no run-time pruning is required and there's at least one
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 1bc00826c1..7080cb25d9 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -798,6 +798,7 @@ prune_append_rel_partitions(RelOptInfo *rel)
/* These are not valid when being called from the planner */
context.planstate = NULL;
+ context.exprcontext = NULL;
context.exprstates = NULL;
/* Actual pruning happens here. */
@@ -808,8 +809,8 @@ prune_append_rel_partitions(RelOptInfo *rel)
* get_matching_partitions
* Determine partitions that survive partition pruning
*
- * Note: context->planstate must be set to a valid PlanState when the
- * pruning_steps were generated with a target other than PARTTARGET_PLANNER.
+ * Note: context->exprcontext must be valid when the pruning_steps were
+ * generated with a target other than PARTTARGET_PLANNER.
*
* Returns a Bitmapset of the RelOptInfo->part_rels indexes of the surviving
* partitions.
@@ -3654,7 +3655,7 @@ match_boolean_partition_clause(Oid partopfamily, Expr *clause, Expr *partkey,
* exprstate array.
*
* Note that the evaluated result may be in the per-tuple memory context of
- * context->planstate->ps_ExprContext, and we may have leaked other memory
+ * context->exprcontext, and we may have leaked other memory
* there too. This memory must be recovered by resetting that ExprContext
* after we're done with the pruning operation (see execPartition.c).
*/
@@ -3677,13 +3678,18 @@ partkey_datum_from_expr(PartitionPruneContext *context,
ExprContext *ectx;
/*
- * We should never see a non-Const in a step unless we're running in
- * the executor.
+ * We should never see a non-Const in a step unless the caller has
+ * passed a valid ExprContext.
+ *
+ * When context->planstate is valid, context->exprcontext is same
+ * as context->planstate->ps_ExprContext.
*/
- Assert(context->planstate != NULL);
+ Assert(context->planstate != NULL || context->exprcontext != NULL);
+ Assert(context->planstate == NULL ||
+ (context->exprcontext == context->planstate->ps_ExprContext));
exprstate = context->exprstates[stateidx];
- ectx = context->planstate->ps_ExprContext;
+ ectx = context->exprcontext;
*value = ExecEvalExprSwitchContext(exprstate, ectx, isnull);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 603d8becc4..fd5735a946 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -119,10 +119,9 @@ extern ResultRelInfo *ExecFindPartition(ModifyTableState *mtstate,
EState *estate);
extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
PartitionTupleRouting *proute);
-extern PartitionPruneState *ExecCreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *partitionpruneinfo);
+extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
+ int n_total_subplans,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate);
-extern Bitmapset *ExecFindInitialMatchingSubPlans(PartitionPruneState *prunestate,
- int nsubplans);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index ee11b6feae..90684efa25 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -41,6 +41,7 @@ struct RelOptInfo;
* subsidiary data, such as the FmgrInfos.
* planstate Points to the parent plan node's PlanState when called
* during execution; NULL when called from the planner.
+ * exprcontext ExprContext to use when evaluating pruning expressions
* exprstates Array of ExprStates, indexed as per PruneCxtStateIdx; one
* for each partition key in each pruning step. Allocated if
* planstate is non-NULL, otherwise NULL.
@@ -56,6 +57,7 @@ typedef struct PartitionPruneContext
FmgrInfo *stepcmpfuncs;
MemoryContext ppccontext;
PlanState *planstate;
+ ExprContext *exprcontext;
ExprState **exprstates;
} PartitionPruneContext;
--
2.24.1
[application/octet-stream] v8-0003-Add-a-plan_tree_walker.patch (3.9K, 4-v8-0003-Add-a-plan_tree_walker.patch)
download | inline diff:
From 3f3bfe578401c43e578196f46f2bad7d3071411a Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Thu, 3 Mar 2022 16:04:13 +0900
Subject: [PATCH v8 3/4] Add a plan_tree_walker()
Like planstate_tree_walker() but for uninitialized plan trees.
---
src/backend/nodes/nodeFuncs.c | 116 ++++++++++++++++++++++++++++++++++
src/include/nodes/nodeFuncs.h | 3 +
2 files changed, 119 insertions(+)
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 4789ba6911..51cac40a3e 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -31,6 +31,10 @@ static bool planstate_walk_subplans(List *plans, bool (*walker) (),
void *context);
static bool planstate_walk_members(PlanState **planstates, int nplans,
bool (*walker) (), void *context);
+static bool plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context);
+static bool plan_walk_members(List *plans, bool (*walker) (), void *context);
/*
@@ -4645,3 +4649,115 @@ planstate_walk_members(PlanState **planstates, int nplans,
return false;
}
+
+/*
+ * plan_tree_walker --- walk plantrees
+ *
+ * The walker has already visited the current node, and so we need only
+ * recurse into any sub-nodes it has.
+ */
+bool
+plan_tree_walker(Plan *plan,
+ bool (*walker) (),
+ void *context)
+{
+ /* Guard against stack overflow due to overly complex plan trees */
+ check_stack_depth();
+
+ /* initPlan-s */
+ if (plan_walk_subplans(plan->initPlan, walker, context))
+ return true;
+
+ /* lefttree */
+ if (outerPlan(plan))
+ {
+ if (walker(outerPlan(plan), context))
+ return true;
+ }
+
+ /* righttree */
+ if (innerPlan(plan))
+ {
+ if (walker(innerPlan(plan), context))
+ return true;
+ }
+
+ /* special child plans */
+ switch (nodeTag(plan))
+ {
+ case T_Append:
+ if (plan_walk_members(((Append *) plan)->appendplans,
+ walker, context))
+ return true;
+ break;
+ case T_MergeAppend:
+ if (plan_walk_members(((MergeAppend *) plan)->mergeplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapAnd:
+ if (plan_walk_members(((BitmapAnd *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_BitmapOr:
+ if (plan_walk_members(((BitmapOr *) plan)->bitmapplans,
+ walker, context))
+ return true;
+ break;
+ case T_CustomScan:
+ if (plan_walk_members(((CustomScan *) plan)->custom_plans,
+ walker, context))
+ return true;
+ break;
+ case T_SubqueryScan:
+ if (walker(((SubqueryScan *) plan)->subplan, context))
+ return true;
+ break;
+ default:
+ break;
+ }
+
+ return false;
+}
+
+/*
+ * Walk a list of SubPlans (or initPlans, which also use SubPlan nodes).
+ */
+static bool
+plan_walk_subplans(List *plans,
+ bool (*walker) (),
+ void *context)
+{
+ ListCell *lc;
+ PlannedStmt *plannedstmt = (PlannedStmt *) context;
+
+ foreach(lc, plans)
+ {
+ SubPlan *sp = lfirst_node(SubPlan, lc);
+ Plan *p = list_nth(plannedstmt->subplans, sp->plan_id - 1);
+
+ if (walker(p, context))
+ return true;
+ }
+
+ return false;
+}
+
+/*
+ * Walk the constituent plans of a ModifyTable, Append, MergeAppend,
+ * BitmapAnd, or BitmapOr node.
+ */
+static bool
+plan_walk_members(List *plans, bool (*walker) (), void *context)
+{
+ ListCell *lc;
+
+ foreach(lc, plans)
+ {
+ if (walker(lfirst(lc), context))
+ return true;
+ }
+
+ return false;
+}
diff --git a/src/include/nodes/nodeFuncs.h b/src/include/nodes/nodeFuncs.h
index 93c60bde66..fca107ad65 100644
--- a/src/include/nodes/nodeFuncs.h
+++ b/src/include/nodes/nodeFuncs.h
@@ -158,5 +158,8 @@ extern bool raw_expression_tree_walker(Node *node, bool (*walker) (),
struct PlanState;
extern bool planstate_tree_walker(struct PlanState *planstate, bool (*walker) (),
void *context);
+struct Plan;
+extern bool plan_tree_walker(struct Plan *plan, bool (*walker) (),
+ void *context);
#endif /* NODEFUNCS_H */
--
2.24.1
[application/octet-stream] v8-0002-Add-Merge-Append.partitioned_rels.patch (17.4K, 5-v8-0002-Add-Merge-Append.partitioned_rels.patch)
download | inline diff:
From 8b99146c9b8c4826e1434d3f006597681c24cd45 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Thu, 24 Mar 2022 22:47:03 +0900
Subject: [PATCH v8 2/4] Add [Merge]Append.partitioned_rels
To record the RT indexes of all partitioned ancestors leading up to
leaf partitions that are appended by the node.
If a given [Merge]Append node is left out from the plan due to there
being only one element in its list of child subplans, then its
partitioned_rels set is added to PlannerGlobal.elidedAppendPartedRels
that is passed down to the executor through PlannedStmt.
There are no users for partitioned_rels and elidedAppendPartedRels
as of this commit, though a later commit will require the ability
to extract the set of relations that must be locked to make a plan
tree safe for execution by walking the plan tree itself, so having
the partitioned tables be also present in the plan tree will be
helpful. Note that currently the executor relies on the fact that
the set of relations to be locked can be obtained by simply scanning
the range table that's made available in PlannedStmt along with the
plan tree.
---
src/backend/nodes/copyfuncs.c | 3 +++
src/backend/nodes/outfuncs.c | 5 +++++
src/backend/nodes/readfuncs.c | 3 +++
src/backend/optimizer/path/joinrels.c | 9 ++++++++
src/backend/optimizer/plan/createplan.c | 18 +++++++++++++++-
src/backend/optimizer/plan/planner.c | 8 +++++++
src/backend/optimizer/plan/setrefs.c | 28 +++++++++++++++++++++++++
src/backend/optimizer/util/inherit.c | 16 ++++++++++++++
src/backend/optimizer/util/relnode.c | 20 ++++++++++++++++++
src/include/nodes/pathnodes.h | 22 +++++++++++++++++++
src/include/nodes/plannodes.h | 17 +++++++++++++++
11 files changed, 148 insertions(+), 1 deletion(-)
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 56505557bf..29c515d7db 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -106,6 +106,7 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_NODE_FIELD(invalItems);
COPY_NODE_FIELD(paramExecTypes);
COPY_NODE_FIELD(utilityStmt);
+ COPY_BITMAPSET_FIELD(elidedAppendPartedRels);
COPY_LOCATION_FIELD(stmt_location);
COPY_SCALAR_FIELD(stmt_len);
@@ -254,6 +255,7 @@ _copyAppend(const Append *from)
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
COPY_NODE_FIELD(part_prune_info);
+ COPY_BITMAPSET_FIELD(partitioned_rels);
return newnode;
}
@@ -282,6 +284,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
COPY_NODE_FIELD(part_prune_info);
+ COPY_BITMAPSET_FIELD(partitioned_rels);
return newnode;
}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 6e39590730..108ede9af9 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -324,6 +324,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
WRITE_NODE_FIELD(utilityStmt);
+ WRITE_BITMAPSET_FIELD(elidedAppendPartedRels);
WRITE_LOCATION_FIELD(stmt_location);
WRITE_INT_FIELD(stmt_len);
}
@@ -444,6 +445,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
WRITE_NODE_FIELD(part_prune_info);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
@@ -461,6 +463,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
WRITE_NODE_FIELD(part_prune_info);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
@@ -2404,6 +2407,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_BOOL_FIELD(parallelModeOK);
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_CHAR_FIELD(maxParallelHazard);
+ WRITE_BITMAPSET_FIELD(elidedAppendPartedRels);
}
static void
@@ -2515,6 +2519,7 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
WRITE_BOOL_FIELD(partbounds_merged);
WRITE_BITMAPSET_FIELD(live_parts);
WRITE_BITMAPSET_FIELD(all_partrels);
+ WRITE_BITMAPSET_FIELD(partitioned_rels);
}
static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index c94b2561f0..ce146dd45e 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1794,6 +1794,7 @@ _readPlannedStmt(void)
READ_NODE_FIELD(invalItems);
READ_NODE_FIELD(paramExecTypes);
READ_NODE_FIELD(utilityStmt);
+ READ_BITMAPSET_FIELD(elidedAppendPartedRels);
READ_LOCATION_FIELD(stmt_location);
READ_INT_FIELD(stmt_len);
@@ -1917,6 +1918,7 @@ _readAppend(void)
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
READ_NODE_FIELD(part_prune_info);
+ READ_BITMAPSET_FIELD(partitioned_rels);
READ_DONE();
}
@@ -1939,6 +1941,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
READ_NODE_FIELD(part_prune_info);
+ READ_BITMAPSET_FIELD(partitioned_rels);
READ_DONE();
}
diff --git a/src/backend/optimizer/path/joinrels.c b/src/backend/optimizer/path/joinrels.c
index 9da3ff2f9a..e74d40fee3 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -1549,6 +1549,15 @@ try_partitionwise_join(PlannerInfo *root, RelOptInfo *rel1, RelOptInfo *rel2,
populate_joinrel_with_paths(root, child_rel1, child_rel2,
child_joinrel, child_sjinfo,
child_restrictlist);
+
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * joinrel's set.
+ */
+ joinrel->partitioned_rels =
+ bms_add_members(joinrel->partitioned_rels,
+ child_joinrel->partitioned_rels);
}
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 179c87c671..99868a1a79 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -26,10 +26,12 @@
#include "nodes/extensible.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
+#include "optimizer/appendinfo.h"
#include "optimizer/clauses.h"
#include "optimizer/cost.h"
#include "optimizer/optimizer.h"
#include "optimizer/paramassign.h"
+#include "optimizer/pathnode.h"
#include "optimizer/paths.h"
#include "optimizer/placeholder.h"
#include "optimizer/plancat.h"
@@ -1332,11 +1334,11 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
best_path->subpaths,
prunequal);
}
-
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
plan->part_prune_info = partpruneinfo;
+ plan->partitioned_rels = bms_copy(rel->partitioned_rels);
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1500,6 +1502,20 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
node->mergeplans = subplans;
node->part_prune_info = partpruneinfo;
+ /*
+ * We need to explicitly add to the plan node the RT indexes of any
+ * partitioned tables whose partitions will be scanned by the nodes in
+ * 'subplans'. There can be multiple RT indexes in the set due to the
+ * partition tree being multi-level and/or this being a plan for UNION ALL
+ * over multiple partition trees. Along with scanrelids of leaf-level Scan
+ * nodes, this allows the executor to lock the full set of relations being
+ * scanned by this node.
+ *
+ * Note that 'apprelids' only contains the top-level base relation(s), so
+ * is not sufficient for the purpose.
+ */
+ node->partitioned_rels = bms_copy(rel->partitioned_rels);
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
* produce either the exact tlist or a narrow tlist, we should get rid of
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b2569c5d0c..c769b4b4b9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -529,6 +529,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->paramExecTypes = glob->paramExecTypes;
/* utilityStmt should be null, but we might as well copy it */
result->utilityStmt = parse->utilityStmt;
+ result->elidedAppendPartedRels = glob->elidedAppendPartedRels;
result->stmt_location = parse->stmt_location;
result->stmt_len = parse->stmt_len;
@@ -7534,6 +7535,13 @@ create_partitionwise_grouping_paths(PlannerInfo *root,
add_paths_to_append_rel(root, grouped_rel, grouped_live_children);
}
+
+ /*
+ * Input rel might be a partitioned appendrel, though grouped_rel has at
+ * this point taken its role as the an appendrel owning the former's
+ * children, so copy the former's partitioned_rels set into the latter.
+ */
+ grouped_rel->partitioned_rels = bms_copy(input_rel->partitioned_rels);
}
/*
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index bf4c722c02..8214edec54 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1574,6 +1574,10 @@ set_append_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /* Fix up partitioned_rels before possibly removing the Append below. */
+ aplan->partitioned_rels = offset_relid_set(aplan->partitioned_rels,
+ rtoffset);
+
/*
* See if it's safe to get rid of the Append entirely. For this to be
* safe, there must be only one child plan and that child plan's parallel
@@ -1584,8 +1588,17 @@ set_append_references(PlannerInfo *root,
*/
if (list_length(aplan->appendplans) == 1 &&
((Plan *) linitial(aplan->appendplans))->parallel_aware == aplan->plan.parallel_aware)
+ {
+ /*
+ * Partitioned table involved, if any, must be made known to the
+ * executor.
+ */
+ root->glob->elidedAppendPartedRels =
+ bms_add_members(root->glob->elidedAppendPartedRels,
+ aplan->partitioned_rels);
return clean_up_removed_plan_level((Plan *) aplan,
(Plan *) linitial(aplan->appendplans));
+ }
/*
* Otherwise, clean up the Append as needed. It's okay to do this after
@@ -1646,6 +1659,12 @@ set_mergeappend_references(PlannerInfo *root,
lfirst(l) = set_plan_refs(root, (Plan *) lfirst(l), rtoffset);
}
+ /*
+ * Fix up partitioned_rels before possibly removing the MergeAppend below.
+ */
+ mplan->partitioned_rels = offset_relid_set(mplan->partitioned_rels,
+ rtoffset);
+
/*
* See if it's safe to get rid of the MergeAppend entirely. For this to
* be safe, there must be only one child plan and that child plan's
@@ -1656,8 +1675,17 @@ set_mergeappend_references(PlannerInfo *root,
*/
if (list_length(mplan->mergeplans) == 1 &&
((Plan *) linitial(mplan->mergeplans))->parallel_aware == mplan->plan.parallel_aware)
+ {
+ /*
+ * Partitioned tables involved, if any, must be made known to the
+ * executor.
+ */
+ root->glob->elidedAppendPartedRels =
+ bms_add_members(root->glob->elidedAppendPartedRels,
+ mplan->partitioned_rels);
return clean_up_removed_plan_level((Plan *) mplan,
(Plan *) linitial(mplan->mergeplans));
+ }
/*
* Otherwise, clean up the MergeAppend as needed. It's okay to do this
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index 7e134822f3..56912e4101 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -406,6 +406,14 @@ expand_partitioned_rtentry(PlannerInfo *root, RelOptInfo *relinfo,
childrte, childRTindex,
childrel, top_parentrc, lockmode);
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * rel's set.
+ */
+ relinfo->partitioned_rels = bms_add_members(relinfo->partitioned_rels,
+ childrelinfo->partitioned_rels);
+
/* Close child relation, but keep locks */
table_close(childrel, NoLock);
}
@@ -737,6 +745,14 @@ expand_appendrel_subquery(PlannerInfo *root, RelOptInfo *rel,
/* Child may itself be an inherited rel, either table or subquery. */
if (childrte->inh)
expand_inherited_rtentry(root, childrel, childrte, childRTindex);
+
+ /*
+ * A parent relation's partitioned_rels must be a superset of the sets
+ * of all its children, direct or indirect, so bubble up the child
+ * rel's set.
+ */
+ rel->partitioned_rels = bms_add_members(rel->partitioned_rels,
+ childrel->partitioned_rels);
}
}
diff --git a/src/backend/optimizer/util/relnode.c b/src/backend/optimizer/util/relnode.c
index 520409f4ba..1d082a8fdd 100644
--- a/src/backend/optimizer/util/relnode.c
+++ b/src/backend/optimizer/util/relnode.c
@@ -361,6 +361,10 @@ build_simple_rel(PlannerInfo *root, int relid, RelOptInfo *parent)
}
}
+ /* A partitioned appendrel. */
+ if (rel->part_scheme != NULL)
+ rel->partitioned_rels = bms_copy(rel->relids);
+
/* Save the finished struct in the query's simple_rel_array */
root->simple_rel_array[relid] = rel;
@@ -729,6 +733,14 @@ build_join_rel(PlannerInfo *root,
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
sjinfo, restrictlist);
+ /*
+ * The joinrel may get processed as an appendrel via partitionwise join
+ * if both outer and inner rels are partitioned, so set partitioned_rels
+ * appropriately.
+ */
+ joinrel->partitioned_rels = bms_union(outer_rel->partitioned_rels,
+ inner_rel->partitioned_rels);
+
/*
* Set the consider_parallel flag if this joinrel could potentially be
* scanned within a parallel worker. If this flag is false for either
@@ -897,6 +909,14 @@ build_child_join_rel(PlannerInfo *root, RelOptInfo *outer_rel,
set_joinrel_size_estimates(root, joinrel, outer_rel, inner_rel,
sjinfo, restrictlist);
+ /*
+ * The joinrel may get processed as an appendrel via partitionwise join
+ * if both outer and inner rels are partitioned, so set partitioned_rels
+ * appropriately.
+ */
+ joinrel->partitioned_rels = bms_union(outer_rel->partitioned_rels,
+ inner_rel->partitioned_rels);
+
/* We build the join only once. */
Assert(!find_join_rel(root, joinrel->relids));
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6cbcb67bdf..ef9b54739a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -130,6 +130,11 @@ typedef struct PlannerGlobal
char maxParallelHazard; /* worst PROPARALLEL hazard level */
PartitionDirectory partition_directory; /* partition descriptors */
+
+ Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
+ * single-subplan [Merge]Append nodes
+ * that have been removed fron the
+ * various plan trees. */
} PlannerGlobal;
/* macro for fetching the Plan associated with a SubPlan node */
@@ -773,6 +778,23 @@ typedef struct RelOptInfo
Relids all_partrels; /* Relids set of all partition relids */
List **partexprs; /* Non-nullable partition key expressions */
List **nullable_partexprs; /* Nullable partition key expressions */
+
+ /*
+ * For an appendrel parent relation (base, join, or upper) that is
+ * partitioned, this stores the RT indexes of all the paritioned ancestors
+ * including itself that lead up to the individual leaf partitions that
+ * will be scanned to produce this relation's output rows. The relid set
+ * is copied into the resulting Append or MergeAppend plan node for
+ * allowing the executor to take appropriate locks on those relations,
+ * unless the node is deemed useless in setrefs.c due to having a single
+ * leaf subplan and thus elided from the final plan, in which case, the set
+ * is added into PlannerGlobal.elidedAppendPartedRels.
+ *
+ * Note that 'apprelids' of those nodes only contains the top-level base
+ * relation(s), so is not sufficient for said purpose.
+ */
+
+ Bitmapset *partitioned_rels;
} RelOptInfo;
/*
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 50ef3dda05..a823c7c20d 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -86,6 +86,11 @@ typedef struct PlannedStmt
Node *utilityStmt; /* non-null if this is utility stmt */
+ Bitmapset *elidedAppendPartedRels; /* Combined partitioned_rels of all
+ * single-subplan [Merge]Append nodes
+ * that have been removed from the
+ * various plan trees. */
+
/* statement location in source string (copied from Query) */
int stmt_location; /* start location, or -1 if unknown */
int stmt_len; /* length in bytes; 0 means "rest of string" */
@@ -264,6 +269,12 @@ typedef struct Append
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * RT indexes of all partitioned parents whose partitions' plans are
+ * present in appendplans.
+ */
+ Bitmapset *partitioned_rels;
} Append;
/* ----------------
@@ -284,6 +295,12 @@ typedef struct MergeAppend
bool *nullsFirst; /* NULLS FIRST/LAST directions */
/* Info for run-time subplan pruning; NULL if we're not doing that */
struct PartitionPruneInfo *part_prune_info;
+
+ /*
+ * RT indexes of all partitioned parents whose partitions' plans are
+ * present in appendplans.
+ */
+ Bitmapset *partitioned_rels;
} MergeAppend;
/* ----------------
--
2.24.1
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-03-31 09:56 Alvaro Herrera <[email protected]>
parent: Amit Langote <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Alvaro Herrera @ 2022-03-31 09:56 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers; David Rowley <[email protected]>
I'm looking at 0001 here with intention to commit later. I see that
there is some resistance to 0004, but I think a final verdict on that
one doesn't materially affect 0001.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"El destino baraja y nosotros jugamos" (A. Schopenhauer)
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-03-31 11:11 Amit Langote <[email protected]>
parent: Alvaro Herrera <[email protected]>
0 siblings, 0 replies; 82+ messages in thread
From: Amit Langote @ 2022-03-31 11:11 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Robert Haas <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers; David Rowley <[email protected]>
On Thu, Mar 31, 2022 at 6:55 PM Alvaro Herrera <[email protected]> wrote:
> I'm looking at 0001 here with intention to commit later. I see that
> there is some resistance to 0004, but I think a final verdict on that
> one doesn't materially affect 0001.
Thanks.
While the main goal of the refactoring patch is to make it easier to
review the more complex changes that 0004 makes to execPartition.c, I
agree it has merit on its own. Although, one may say that the bit
about providing a PlanState-independent ExprContext is more closely
tied with 0004's requirements...
--
Amit Langote
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-05-27 08:09 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
1 sibling, 2 replies; 82+ messages in thread
From: Amit Langote @ 2022-05-27 08:09 UTC (permalink / raw)
To: Zhihong Yu <[email protected]>; +Cc: David Rowley <[email protected]>; Robert Haas <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Mon, Apr 11, 2022 at 12:53 PM Zhihong Yu <[email protected]> wrote:
> On Sun, Apr 10, 2022 at 8:05 PM Amit Langote <[email protected]> wrote:
>> Sending v15 that fixes that to keep the cfbot green for now.
>
> Hi,
>
> + /* RT index of the partitione table. */
>
> partitione -> partitioned
Thanks, fixed.
Also, I broke this into patches:
0001 contains the mechanical changes of moving PartitionPruneInfo out
of Append/MergeAppend into a list in PlannedStmt.
0002 is the main patch to "Optimize AcquireExecutorLocks() by locking
only unpruned partitions".
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v16-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch (21.2K, 2-v16-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch)
download | inline diff:
From 16fd07b7c8ffde7632ffa7b03e4595e1e08d7e06 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v16 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of the Append/MergeAppend plan
node to which it would be added until now and set an index field in
the plan node that point to the list element.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked to validate a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than them having
to be found individually by walking the plan tree, which can be done
by simply iterative over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/nodes/copyfuncs.c | 5 +-
src/backend/nodes/outfuncs.c | 7 ++-
src/backend/nodes/readfuncs.c | 5 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 12 +++--
src/include/partitioning/partprune.h | 8 +--
18 files changed, 104 insertions(+), 68 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..72fc273524 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f1fd7f7e8b..f73b8c2607 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e03ea27299..b55cdd2580 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1638,11 +1638,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..f9c7976ff2 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,8 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 51d630fa89..8fbeaa4f36 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -96,6 +96,7 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(parallelModeNeeded);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_NODE_FIELD(partPruneInfos);
COPY_NODE_FIELD(rtable);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
@@ -253,7 +254,7 @@ _copyAppend(const Append *from)
COPY_NODE_FIELD(appendplans);
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -281,7 +282,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index ce12915592..72fcd8a6ee 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -321,6 +321,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_NODE_FIELD(partPruneInfos);
WRITE_NODE_FIELD(rtable);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
@@ -450,7 +451,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_NODE_FIELD(appendplans);
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -467,7 +468,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(sortOperators, node->numCols);
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -2434,6 +2435,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
+ WRITE_NODE_FIELD(partPruneInfos);
WRITE_NODE_FIELD(relationOids);
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
@@ -2501,6 +2503,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
WRITE_BOOL_FIELD(partColsUpdated);
+ WRITE_NODE_FIELD(partPruneInfos);
}
static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 6a05b69415..bf602ff93e 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1817,6 +1817,7 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(parallelModeNeeded);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_NODE_FIELD(partPruneInfos);
READ_NODE_FIELD(rtable);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
@@ -1949,7 +1950,7 @@ _readAppend(void)
READ_NODE_FIELD(appendplans);
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -1971,7 +1972,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(sortOperators, local_node->numCols);
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 76606faa3e..58a05cf673 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1426,7 +1426,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1519,6 +1518,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1542,13 +1544,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index a0f2390334..32e658b5d6 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,6 +518,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index d95fd89807..aafe1c149d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1640,21 +1663,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1712,21 +1726,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..d77f7d3aef 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5728801379..25e0bb976e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -596,6 +596,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a6e5db4eec..6995b0ecec 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -107,6 +107,9 @@ typedef struct PlannerGlobal
List *appendRelations; /* "flat" list of AppendRelInfos */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *relationOids; /* OIDs of relations the plan depends on */
List *invalItems; /* other dependencies, as PlanInvalItems */
@@ -378,6 +381,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0ea9a22dfb..297cacfb5b 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -64,6 +64,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -262,8 +265,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -282,8 +285,9 @@ typedef struct MergeAppend
Oid *sortOperators; /* OIDs of operators to sort them by */
Oid *collations; /* OIDs of collations */
bool *nullsFirst; /* NULLS FIRST/LAST directions */
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
[application/octet-stream] v16-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch (87.1K, 3-v16-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch)
download | inline diff:
From 6654d7c2b5c54d69d3f8a0136cfaf5593a3b7aae Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v16 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 27 +++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 27 ++-
src/backend/executor/execPartition.c | 234 +++++++++++++++++++++----
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/copyfuncs.c | 27 +++
src/backend/nodes/outfuncs.c | 29 +++
src/backend/nodes/readfuncs.c | 51 ++++++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 45 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 184 ++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 28 +++
src/include/nodes/nodes.h | 4 +
src/include/nodes/pathnodes.h | 9 +
src/include/nodes/plannodes.h | 19 ++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
34 files changed, 849 insertions(+), 96 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index fca29a9a10..d839517693 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -541,7 +541,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 5d1f7089da..111d384982 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 767d9b9619..1d55a23ded 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index d1ee106465..e878209674 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 80738547ed..c7360712b1 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..e0802be723 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,29 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+proceeds by locking all the relations that will be scanned by that plan. If
+the generic plan contains nodes that can perform execution time partition
+pruning (that is, contain a PartitionPruneInfo), a subset of pruning steps
+contained in the PartitionPruneInfos that do not depend on execution actually
+having started (called "initial" pruning steps) are performed at this point
+to figure out the minimal set of child subplans that satisfy those pruning
+instructions. AcquireExecutorLocks() looking at a particular plan will then
+lock only the relations scanned by those surviving subplans (along with those
+present in PlannedStmt.minLockRelids), and ignore those scanned by the pruned
+subplans, even though the pruned subplans themselves are not removed from the
+plan tree. The result of pruning (that is, the set of indexes of surviving
+subplans in their parent's list of child subplans) is saved as a list of
+bitmapsets, with one element for every PartitionPruneInfo referenced in the
+plan (PlannedStmt.partPruneInfos). The list is packaged into a
+PartitionPruneResult node, which is passed along with the PlannedStmt to the
+executor via the QueryDesc. It is imperative that the executor and any third
+party code invoked by it that gets passed the plan tree look at the plan's
+PartitionPruneResult to determine whether a particular child subplan of a
+parent node that supports pruning is valid for a given execution.
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +309,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 72fc273524..45824624f8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f73b8c2607..7e6dab5623 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b55cdd2580..86227301e9 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1593,8 +1599,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1611,6 +1619,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1628,8 +1643,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecDoInitialPruning()), and in that case only the surviving subplans'
+ * indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1645,24 +1661,59 @@ ExecInitPartitionPruning(PlanState *planstate,
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1670,7 +1721,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1686,11 +1738,73 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1704,19 +1818,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1771,15 +1887,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1793,6 +1936,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1803,6 +1947,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1853,6 +1999,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1860,6 +2008,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1881,7 +2030,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1891,7 +2040,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2119,10 +2268,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2157,7 +2310,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2171,6 +2324,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2181,13 +2336,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2214,8 +2371,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2223,7 +2386,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..303a572c02 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 8fbeaa4f36..ca139797a8 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -97,7 +97,9 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
COPY_NODE_FIELD(partPruneInfos);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(minLockRelids);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -1284,6 +1286,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -1300,6 +1304,7 @@ _copyPartitionedRelPruneInfo(const PartitionedRelPruneInfo *from)
COPY_POINTER_FIELD(subplan_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(relid_map, from->nparts * sizeof(Oid));
+ COPY_POINTER_FIELD(rti_map, from->nparts * sizeof(Index));
COPY_NODE_FIELD(initial_pruning_steps);
COPY_NODE_FIELD(exec_pruning_steps);
COPY_BITMAPSET_FIELD(execparamids);
@@ -5475,6 +5480,21 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static PartitionPruneResult *
+_copyPartitionPruneResult(const PartitionPruneResult *from)
+{
+ PartitionPruneResult *newnode = makeNode(PartitionPruneResult);
+
+ COPY_NODE_FIELD(valid_subplan_offs_list);
+ COPY_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -6571,6 +6591,13 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ retval = _copyPartitionPruneResult(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 72fcd8a6ee..53010bf059 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -322,7 +322,9 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -1017,6 +1019,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -1031,6 +1035,7 @@ _outPartitionedRelPruneInfo(StringInfo str, const PartitionedRelPruneInfo *node)
WRITE_INT_ARRAY(subplan_map, node->nparts);
WRITE_INT_ARRAY(subpart_map, node->nparts);
WRITE_OID_ARRAY(relid_map, node->nparts);
+ WRITE_INDEX_ARRAY(rti_map, node->nparts);
WRITE_NODE_FIELD(initial_pruning_steps);
WRITE_NODE_FIELD(exec_pruning_steps);
WRITE_BITMAPSET_FIELD(execparamids);
@@ -2436,6 +2441,8 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(relationOids);
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
@@ -2857,6 +2864,21 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outPartitionPruneResult(StringInfo str, const PartitionPruneResult *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNERESULT");
+
+ WRITE_NODE_FIELD(valid_subplan_offs_list);
+ WRITE_BITMAPSET_FIELD(scan_leafpart_rtis);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4766,6 +4788,13 @@ outNode(StringInfo str, const void *obj)
_outJsonTableSibling(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ _outPartitionPruneResult(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index bf602ff93e..c1d131aa99 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -164,6 +164,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -1818,7 +1823,9 @@ _readPlannedStmt(void)
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
READ_NODE_FIELD(partPruneInfos);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(minLockRelids);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -2770,6 +2777,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2786,6 +2795,7 @@ _readPartitionedRelPruneInfo(void)
READ_INT_ARRAY(subplan_map, local_node->nparts);
READ_INT_ARRAY(subpart_map, local_node->nparts);
READ_OID_ARRAY(relid_map, local_node->nparts);
+ READ_INDEX_ARRAY(rti_map, local_node->nparts);
READ_NODE_FIELD(initial_pruning_steps);
READ_NODE_FIELD(exec_pruning_steps);
READ_BITMAPSET_FIELD(execparamids);
@@ -2939,6 +2949,21 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+
+/*
+ * _readPartitionPruneResult
+ */
+static PartitionPruneResult *
+_readPartitionPruneResult(void)
+{
+ READ_LOCALS(PartitionPruneResult);
+
+ READ_NODE_FIELD(valid_subplan_offs_list);
+ READ_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -3236,6 +3261,8 @@ parseNodeString(void)
return_value = _readJsonTableParent();
else if (MATCH("JSONTABLESIBLING", 16))
return_value = _readJsonTableSibling();
+ else if (MATCH("PARTITIONPRUNERESULT", 20))
+ return_value = _readPartitionPruneResult();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
@@ -3379,6 +3406,30 @@ readIntCols(int numCols)
return int_vals;
}
+/*
+ * readIndexCols
+ */
+Index *
+readIndexCols(int numCols)
+{
+ int tokenLength,
+ i;
+ const char *token;
+ Index *index_vals;
+
+ if (numCols <= 0)
+ return NULL;
+
+ index_vals = (Index *) palloc(numCols * sizeof(Index));
+ for (i = 0; i < numCols; i++)
+ {
+ token = pg_strtok(&tokenLength);
+ index_vals[i] = atoui(token);
+ }
+
+ return index_vals;
+}
+
/*
* readBoolCols
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 32e658b5d6..edbf19716e 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,7 +519,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index aafe1c149d..a32fc70785 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,49 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bit from it just above to prevent empty tail bits resulting in
+ * inefficient looping during AcquireExecutorLocks().
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d77f7d3aef..952c5b8327 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 8b6b5bbaaa..7f0eda48a4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1603,6 +1603,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1978,7 +1979,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1993,6 +1996,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..8c164741f7 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,35 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of partitions to be locked from the
+ * PartitionPruneInfos by considering the result of performing
+ * initial partition pruning.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1872,58 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..1bbe6b704b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 666977fb1f..bbc8c42d88 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..5c4a282be0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 25e0bb976e..d3ae0fa52d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -986,6 +986,34 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfos found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass onto the executor. The executor
+ * refers to this node when made available when initializing the plan nodes to
+ * which those PartitionPruneInfos apply so that the same set of qualifying
+ * subplans are initialized, rather than deriving that set again by redoing
+ * initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index b3b407579b..84d67d5dcf 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -97,6 +97,9 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_PartitionPruneResult,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
@@ -674,6 +677,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6995b0ecec..c47ce6c09b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -110,6 +110,15 @@ typedef struct PlannerGlobal
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
List *relationOids; /* OIDs of relations the plan depends on */
List *invalItems; /* other dependencies, as PlanInvalItems */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 297cacfb5b..ffb52e2ac2 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -67,8 +67,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1196,6 +1205,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1204,6 +1220,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1234,6 +1252,7 @@ typedef struct PartitionedRelPruneInfo
int *subplan_map; /* subplan index by partition index, or -1 */
int *subpart_map; /* subpart index by partition index, or -1 */
Oid *relid_map; /* relation OID by partition index, or 0 */
+ Index *rti_map; /* Range table index by partition index, 0. */
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..1c5bb5ece1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-05-27 20:08 Zhihong Yu <[email protected]>
parent: Amit Langote <[email protected]>
1 sibling, 0 replies; 82+ messages in thread
From: Zhihong Yu @ 2022-05-27 20:08 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: David Rowley <[email protected]>; Robert Haas <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Fri, May 27, 2022 at 1:10 AM Amit Langote <[email protected]>
wrote:
> On Mon, Apr 11, 2022 at 12:53 PM Zhihong Yu <[email protected]> wrote:
> > On Sun, Apr 10, 2022 at 8:05 PM Amit Langote <[email protected]>
> wrote:
> >> Sending v15 that fixes that to keep the cfbot green for now.
> >
> > Hi,
> >
> > + /* RT index of the partitione table. */
> >
> > partitione -> partitioned
>
> Thanks, fixed.
>
> Also, I broke this into patches:
>
> 0001 contains the mechanical changes of moving PartitionPruneInfo out
> of Append/MergeAppend into a list in PlannedStmt.
>
> 0002 is the main patch to "Optimize AcquireExecutorLocks() by locking
> only unpruned partitions".
>
> --
> Thanks, Amit Langote
> EDB: http://www.enterprisedb.com
Hi,
In the description:
is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
I think the second `made available` is redundant (can be omitted).
+ * Initial pruning is performed here if needed (unless it has already been
done
+ * by ExecDoInitialPruning()), and in that case only the surviving
subplans'
I wonder if there is a typo above - I don't find ExecDoInitialPruning
either in PG codebase or in the patches (except for this in the comment).
I think it should be ExecutorDoInitialPruning.
+ * bit from it just above to prevent empty tail bits resulting in
I searched in the code base but didn't find mentioning of `empty tail bit`.
Do you mind explaining a bit about it ?
Cheers
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-07-05 17:43 Jacob Champion <[email protected]>
parent: Amit Langote <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Jacob Champion @ 2022-07-05 17:43 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Robert Haas <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Fri, May 27, 2022 at 1:09 AM Amit Langote <[email protected]> wrote:
> 0001 contains the mechanical changes of moving PartitionPruneInfo out
> of Append/MergeAppend into a list in PlannedStmt.
>
> 0002 is the main patch to "Optimize AcquireExecutorLocks() by locking
> only unpruned partitions".
This patchset will need to be rebased over 835d476fd21; looks like
just a cosmetic change.
--Jacob
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-07-06 02:37 Amit Langote <[email protected]>
parent: Jacob Champion <[email protected]>
0 siblings, 2 replies; 82+ messages in thread
From: Amit Langote @ 2022-07-06 02:37 UTC (permalink / raw)
To: Jacob Champion <[email protected]>; +Cc: Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Robert Haas <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Wed, Jul 6, 2022 at 2:43 AM Jacob Champion <[email protected]> wrote:
> On Fri, May 27, 2022 at 1:09 AM Amit Langote <[email protected]> wrote:
> > 0001 contains the mechanical changes of moving PartitionPruneInfo out
> > of Append/MergeAppend into a list in PlannedStmt.
> >
> > 0002 is the main patch to "Optimize AcquireExecutorLocks() by locking
> > only unpruned partitions".
>
> This patchset will need to be rebased over 835d476fd21; looks like
> just a cosmetic change.
Thanks for the heads up.
Rebased and also fixed per comments given by Zhihong Yu on May 28.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v17-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch (21.2K, 2-v17-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch)
download | inline diff:
From 665055be44caaec9dcc2a3251f20ceb3c678fa3d Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v17 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/nodes/copyfuncs.c | 5 +-
src/backend/nodes/outfuncs.c | 7 ++-
src/backend/nodes/readfuncs.c | 5 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
18 files changed, 103 insertions(+), 68 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..72fc273524 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f1fd7f7e8b..f73b8c2607 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e03ea27299..b55cdd2580 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1638,11 +1638,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..f9c7976ff2 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,8 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 706d283a92..b02b4a641c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -96,6 +96,7 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(parallelModeNeeded);
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
+ COPY_NODE_FIELD(partPruneInfos);
COPY_NODE_FIELD(rtable);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
@@ -253,7 +254,7 @@ _copyAppend(const Append *from)
COPY_NODE_FIELD(appendplans);
COPY_SCALAR_FIELD(nasyncplans);
COPY_SCALAR_FIELD(first_partial_plan);
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
@@ -281,7 +282,7 @@ _copyMergeAppend(const MergeAppend *from)
COPY_POINTER_FIELD(sortOperators, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(collations, from->numCols * sizeof(Oid));
COPY_POINTER_FIELD(nullsFirst, from->numCols * sizeof(bool));
- COPY_NODE_FIELD(part_prune_info);
+ COPY_SCALAR_FIELD(part_prune_index);
return newnode;
}
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 4315c53080..7618444b4d 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -325,6 +325,7 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_BOOL_FIELD(parallelModeNeeded);
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
+ WRITE_NODE_FIELD(partPruneInfos);
WRITE_NODE_FIELD(rtable);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
@@ -454,7 +455,7 @@ _outAppend(StringInfo str, const Append *node)
WRITE_NODE_FIELD(appendplans);
WRITE_INT_FIELD(nasyncplans);
WRITE_INT_FIELD(first_partial_plan);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -471,7 +472,7 @@ _outMergeAppend(StringInfo str, const MergeAppend *node)
WRITE_OID_ARRAY(sortOperators, node->numCols);
WRITE_OID_ARRAY(collations, node->numCols);
WRITE_BOOL_ARRAY(nullsFirst, node->numCols);
- WRITE_NODE_FIELD(part_prune_info);
+ WRITE_INT_FIELD(part_prune_index);
}
static void
@@ -2438,6 +2439,7 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(finalrowmarks);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
+ WRITE_NODE_FIELD(partPruneInfos);
WRITE_NODE_FIELD(relationOids);
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
@@ -2505,6 +2507,7 @@ _outPlannerInfo(StringInfo str, const PlannerInfo *node)
WRITE_BITMAPSET_FIELD(curOuterRels);
WRITE_NODE_FIELD(curOuterParams);
WRITE_BOOL_FIELD(partColsUpdated);
+ WRITE_NODE_FIELD(partPruneInfos);
}
static void
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 6a05b69415..bf602ff93e 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -1817,6 +1817,7 @@ _readPlannedStmt(void)
READ_BOOL_FIELD(parallelModeNeeded);
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
+ READ_NODE_FIELD(partPruneInfos);
READ_NODE_FIELD(rtable);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
@@ -1949,7 +1950,7 @@ _readAppend(void)
READ_NODE_FIELD(appendplans);
READ_INT_FIELD(nasyncplans);
READ_INT_FIELD(first_partial_plan);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
@@ -1971,7 +1972,7 @@ _readMergeAppend(void)
READ_OID_ARRAY(sortOperators, local_node->numCols);
READ_OID_ARRAY(collations, local_node->numCols);
READ_BOOL_ARRAY(nullsFirst, local_node->numCols);
- READ_NODE_FIELD(part_prune_info);
+ READ_INT_FIELD(part_prune_index);
READ_DONE();
}
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 76606faa3e..58a05cf673 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1426,7 +1426,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1519,6 +1518,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1542,13 +1544,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 06ad856eac..b11249ed8f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,6 +518,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 9cef92cab2..b8d5610593 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1655,21 +1678,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1727,21 +1741,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..d77f7d3aef 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5728801379..25e0bb976e 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -596,6 +596,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index b88cfb8dc0..a0f3a46334 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -107,6 +107,9 @@ typedef struct PlannerGlobal
List *appendRelations; /* "flat" list of AppendRelInfos */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *relationOids; /* OIDs of relations the plan depends on */
List *invalItems; /* other dependencies, as PlanInvalItems */
@@ -386,6 +389,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index d5c0ebe859..c3f4a39657 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -64,6 +64,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -262,8 +265,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -297,8 +300,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
[application/octet-stream] v17-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch (87.2K, 3-v17-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch)
download | inline diff:
From e5d0283732311fb068ad75ee4ff282ebe5306266 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v17 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 27 ++-
src/backend/executor/execPartition.c | 234 +++++++++++++++++++++----
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/copyfuncs.c | 27 +++
src/backend/nodes/outfuncs.c | 29 +++
src/backend/nodes/readfuncs.c | 51 ++++++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 184 ++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 27 +++
src/include/nodes/nodes.h | 4 +
src/include/nodes/pathnodes.h | 9 +
src/include/nodes/plannodes.h | 21 +++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
34 files changed, 856 insertions(+), 96 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index fca29a9a10..d839517693 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -541,7 +541,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e29c2ae206..e41b13a3ea 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 3db859c3ea..631cc07217 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index d1ee106465..e878209674 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 2333aae467..83465e40f8 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..953a476ea5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed at this
+point to figure out the minimal set of child subplans that satisfy those
+pruning steps. AcquireExecutorLocks() looking at a given plan tree will then
+lock only the relations scanned by the child subplans that survived such
+pruning, along with those present in PlannedStmt.minLockRelids. Note that the
+subplans are only notionally pruned in that they are not removed from the plan
+tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a
+PartitionPruneResult node via the QueryDesc. It consists of the set of
+indexes of surviving subplans in their respective parent plan node's list of
+child subplans, saved as a list of bitmapsets, with one element for every
+parent plan node whose PartitionPruneInfo is present in
+PlannedStmt.partPruneInfos. In other words, the executor should not
+re-evaluate the set of initially valid subplans by redoing the initial pruning
+if it was already done by AcquireExecutorLocks(), because the re-evaluation may
+very well end up resulting in a different set of subplans, containing some
+whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 72fc273524..45824624f8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f73b8c2607..7e6dab5623 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b55cdd2580..24e6f6e988 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1593,8 +1599,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1611,6 +1619,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1628,8 +1643,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1645,24 +1661,59 @@ ExecInitPartitionPruning(PlanState *planstate,
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1670,7 +1721,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1686,11 +1738,73 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1704,19 +1818,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1771,15 +1887,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1793,6 +1936,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1803,6 +1947,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1853,6 +1999,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1860,6 +2008,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1881,7 +2030,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1891,7 +2040,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2119,10 +2268,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2157,7 +2310,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2171,6 +2324,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2181,13 +2336,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2214,8 +2371,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2223,7 +2386,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..303a572c02 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index b02b4a641c..332d58381b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -97,7 +97,9 @@ _copyPlannedStmt(const PlannedStmt *from)
COPY_SCALAR_FIELD(jitFlags);
COPY_NODE_FIELD(planTree);
COPY_NODE_FIELD(partPruneInfos);
+ COPY_SCALAR_FIELD(containsInitialPruning);
COPY_NODE_FIELD(rtable);
+ COPY_BITMAPSET_FIELD(minLockRelids);
COPY_NODE_FIELD(resultRelations);
COPY_NODE_FIELD(appendRelations);
COPY_NODE_FIELD(subplans);
@@ -1284,6 +1286,8 @@ _copyPartitionPruneInfo(const PartitionPruneInfo *from)
PartitionPruneInfo *newnode = makeNode(PartitionPruneInfo);
COPY_NODE_FIELD(prune_infos);
+ COPY_SCALAR_FIELD(needs_init_pruning);
+ COPY_SCALAR_FIELD(needs_exec_pruning);
COPY_BITMAPSET_FIELD(other_subplans);
return newnode;
@@ -1300,6 +1304,7 @@ _copyPartitionedRelPruneInfo(const PartitionedRelPruneInfo *from)
COPY_POINTER_FIELD(subplan_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(subpart_map, from->nparts * sizeof(int));
COPY_POINTER_FIELD(relid_map, from->nparts * sizeof(Oid));
+ COPY_POINTER_FIELD(rti_map, from->nparts * sizeof(Index));
COPY_NODE_FIELD(initial_pruning_steps);
COPY_NODE_FIELD(exec_pruning_steps);
COPY_BITMAPSET_FIELD(execparamids);
@@ -5476,6 +5481,21 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* ****************************************************************
+ * execnodes.h copy functions
+ * ****************************************************************
+ */
+static PartitionPruneResult *
+_copyPartitionPruneResult(const PartitionPruneResult *from)
+{
+ PartitionPruneResult *newnode = makeNode(PartitionPruneResult);
+
+ COPY_NODE_FIELD(valid_subplan_offs_list);
+ COPY_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ return newnode;
+}
+
/* ****************************************************************
* value.h copy functions
* ****************************************************************
@@ -6572,6 +6592,13 @@ copyObjectImpl(const void *from)
retval = _copyPublicationTable(from);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ retval = _copyPartitionPruneResult(from);
+ break;
+
/*
* MISCELLANEOUS NODES
*/
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 7618444b4d..7346820eee 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -326,7 +326,9 @@ _outPlannedStmt(StringInfo str, const PlannedStmt *node)
WRITE_INT_FIELD(jitFlags);
WRITE_NODE_FIELD(planTree);
WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
WRITE_NODE_FIELD(rtable);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(subplans);
@@ -1021,6 +1023,8 @@ _outPartitionPruneInfo(StringInfo str, const PartitionPruneInfo *node)
WRITE_NODE_TYPE("PARTITIONPRUNEINFO");
WRITE_NODE_FIELD(prune_infos);
+ WRITE_BOOL_FIELD(needs_init_pruning);
+ WRITE_BOOL_FIELD(needs_exec_pruning);
WRITE_BITMAPSET_FIELD(other_subplans);
}
@@ -1035,6 +1039,7 @@ _outPartitionedRelPruneInfo(StringInfo str, const PartitionedRelPruneInfo *node)
WRITE_INT_ARRAY(subplan_map, node->nparts);
WRITE_INT_ARRAY(subpart_map, node->nparts);
WRITE_OID_ARRAY(relid_map, node->nparts);
+ WRITE_INDEX_ARRAY(rti_map, node->nparts);
WRITE_NODE_FIELD(initial_pruning_steps);
WRITE_NODE_FIELD(exec_pruning_steps);
WRITE_BITMAPSET_FIELD(execparamids);
@@ -2440,6 +2445,8 @@ _outPlannerGlobal(StringInfo str, const PlannerGlobal *node)
WRITE_NODE_FIELD(resultRelations);
WRITE_NODE_FIELD(appendRelations);
WRITE_NODE_FIELD(partPruneInfos);
+ WRITE_BOOL_FIELD(containsInitialPruning);
+ WRITE_BITMAPSET_FIELD(minLockRelids);
WRITE_NODE_FIELD(relationOids);
WRITE_NODE_FIELD(invalItems);
WRITE_NODE_FIELD(paramExecTypes);
@@ -2861,6 +2868,21 @@ _outExtensibleNode(StringInfo str, const ExtensibleNode *node)
methods->nodeOut(str, node);
}
+/*****************************************************************************
+ *
+ * Stuff from execnodes.h
+ *
+ *****************************************************************************/
+
+static void
+_outPartitionPruneResult(StringInfo str, const PartitionPruneResult *node)
+{
+ WRITE_NODE_TYPE("PARTITIONPRUNERESULT");
+
+ WRITE_NODE_FIELD(valid_subplan_offs_list);
+ WRITE_BITMAPSET_FIELD(scan_leafpart_rtis);
+}
+
/*****************************************************************************
*
* Stuff from parsenodes.h.
@@ -4770,6 +4792,13 @@ outNode(StringInfo str, const void *obj)
_outJsonTableSibling(str, obj);
break;
+ /*
+ * EXECUTION NODES
+ */
+ case T_PartitionPruneResult:
+ _outPartitionPruneResult(str, obj);
+ break;
+
default:
/*
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index bf602ff93e..c1d131aa99 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -164,6 +164,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -1818,7 +1823,9 @@ _readPlannedStmt(void)
READ_INT_FIELD(jitFlags);
READ_NODE_FIELD(planTree);
READ_NODE_FIELD(partPruneInfos);
+ READ_BOOL_FIELD(containsInitialPruning);
READ_NODE_FIELD(rtable);
+ READ_BITMAPSET_FIELD(minLockRelids);
READ_NODE_FIELD(resultRelations);
READ_NODE_FIELD(appendRelations);
READ_NODE_FIELD(subplans);
@@ -2770,6 +2777,8 @@ _readPartitionPruneInfo(void)
READ_LOCALS(PartitionPruneInfo);
READ_NODE_FIELD(prune_infos);
+ READ_BOOL_FIELD(needs_init_pruning);
+ READ_BOOL_FIELD(needs_exec_pruning);
READ_BITMAPSET_FIELD(other_subplans);
READ_DONE();
@@ -2786,6 +2795,7 @@ _readPartitionedRelPruneInfo(void)
READ_INT_ARRAY(subplan_map, local_node->nparts);
READ_INT_ARRAY(subpart_map, local_node->nparts);
READ_OID_ARRAY(relid_map, local_node->nparts);
+ READ_INDEX_ARRAY(rti_map, local_node->nparts);
READ_NODE_FIELD(initial_pruning_steps);
READ_NODE_FIELD(exec_pruning_steps);
READ_BITMAPSET_FIELD(execparamids);
@@ -2939,6 +2949,21 @@ _readPartitionRangeDatum(void)
READ_DONE();
}
+
+/*
+ * _readPartitionPruneResult
+ */
+static PartitionPruneResult *
+_readPartitionPruneResult(void)
+{
+ READ_LOCALS(PartitionPruneResult);
+
+ READ_NODE_FIELD(valid_subplan_offs_list);
+ READ_BITMAPSET_FIELD(scan_leafpart_rtis);
+
+ READ_DONE();
+}
+
/*
* parseNodeString
*
@@ -3236,6 +3261,8 @@ parseNodeString(void)
return_value = _readJsonTableParent();
else if (MATCH("JSONTABLESIBLING", 16))
return_value = _readJsonTableSibling();
+ else if (MATCH("PARTITIONPRUNERESULT", 20))
+ return_value = _readPartitionPruneResult();
else
{
elog(ERROR, "badly formatted node string \"%.32s\"...", token);
@@ -3379,6 +3406,30 @@ readIntCols(int numCols)
return int_vals;
}
+/*
+ * readIndexCols
+ */
+Index *
+readIndexCols(int numCols)
+{
+ int tokenLength,
+ i;
+ const char *token;
+ Index *index_vals;
+
+ if (numCols <= 0)
+ return NULL;
+
+ index_vals = (Index *) palloc(numCols * sizeof(Index));
+ for (i = 0; i < numCols; i++)
+ {
+ token = pg_strtok(&tokenLength);
+ index_vals[i] = atoui(token);
+ }
+
+ return index_vals;
+}
+
/*
* readBoolCols
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b11249ed8f..7141035cc4 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,7 +519,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b8d5610593..da749e331e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d77f7d3aef..952c5b8327 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5ab91c2c58..5ae967608d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1603,6 +1603,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1978,7 +1979,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1993,6 +1996,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..8c164741f7 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,35 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of partitions to be locked from the
+ * PartitionPruneInfos by considering the result of performing
+ * initial partition pruning.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1872,58 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..1bbe6b704b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..e57e133f0e 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..5c4a282be0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 25e0bb976e..4d4bb3fc3c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -986,6 +986,33 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfo found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass on to the executor. The executor
+ * refers to this node when initializing the plan nodes which contain subplans
+ * that may have been pruned by ExecutorDoInitialPruning(), rather than
+ * redoing initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 7ce1fc4deb..c7f256028e 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -97,6 +97,9 @@ typedef enum NodeTag
T_PartitionPruneStepCombine,
T_PlanInvalItem,
+ /* TAGS FOR EXECUTOR PREP NODES (execnodes.h) */
+ T_PartitionPruneResult,
+
/*
* TAGS FOR PLAN STATE NODES (execnodes.h)
*
@@ -675,6 +678,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a0f3a46334..c2d91bb12f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -110,6 +110,15 @@ typedef struct PlannerGlobal
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
List *relationOids; /* OIDs of relations the plan depends on */
List *invalItems; /* other dependencies, as PlanInvalItems */
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c3f4a39657..869bf535bc 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -67,8 +67,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1386,6 +1395,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1394,6 +1410,8 @@ typedef struct PartitionPruneInfo
{
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1436,6 +1454,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map;
+ /* Range table index by partition index, or 0. */
+ Index *rti_map;
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..1c5bb5ece1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-07-13 06:40 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-07-13 06:40 UTC (permalink / raw)
To: Jacob Champion <[email protected]>; +Cc: Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Robert Haas <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
Rebased over 964d01ae90c.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v18-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch (81.4K, 2-v18-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch)
download | inline diff:
From 567059057ee35bcd8ca066f46d4c6b23641af090 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v18 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 27 ++-
src/backend/executor/execPartition.c | 234 +++++++++++++++++++++----
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/copyfuncs.c | 1 -
src/backend/nodes/outfuncs.c | 1 -
src/backend/nodes/readfuncs.c | 29 +++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 187 +++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 27 +++
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 13 ++
src/include/nodes/plannodes.h | 21 +++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
34 files changed, 782 insertions(+), 98 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index fca29a9a10..d839517693 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -541,7 +541,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e29c2ae206..e41b13a3ea 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 3db859c3ea..631cc07217 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..b0ed96e56c 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 2333aae467..83465e40f8 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..953a476ea5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed at this
+point to figure out the minimal set of child subplans that satisfy those
+pruning steps. AcquireExecutorLocks() looking at a given plan tree will then
+lock only the relations scanned by the child subplans that survived such
+pruning, along with those present in PlannedStmt.minLockRelids. Note that the
+subplans are only notionally pruned in that they are not removed from the plan
+tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a
+PartitionPruneResult node via the QueryDesc. It consists of the set of
+indexes of surviving subplans in their respective parent plan node's list of
+child subplans, saved as a list of bitmapsets, with one element for every
+parent plan node whose PartitionPruneInfo is present in
+PlannedStmt.partPruneInfos. In other words, the executor should not
+re-evaluate the set of initially valid subplans by redoing the initial pruning
+if it was already done by AcquireExecutorLocks(), because the re-evaluation may
+very well end up resulting in a different set of subplans, containing some
+whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 72fc273524..45824624f8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f73b8c2607..7e6dab5623 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b55cdd2580..24e6f6e988 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1593,8 +1599,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1611,6 +1619,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1628,8 +1643,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1645,24 +1661,59 @@ ExecInitPartitionPruning(PlanState *planstate,
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1670,7 +1721,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1686,11 +1738,73 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1704,19 +1818,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1771,15 +1887,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1793,6 +1936,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1803,6 +1947,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1853,6 +1999,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1860,6 +2008,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1881,7 +2030,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1891,7 +2040,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2119,10 +2268,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2157,7 +2310,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2171,6 +2324,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2181,13 +2336,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2214,8 +2371,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2223,7 +2386,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..303a572c02 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e76fda8eba..afd0332ddd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -160,7 +160,6 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
-
/*
* copyObjectImpl -- implementation of copyObject(); see nodes/nodes.h
*
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 81f6a9093c..84a195adca 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -294,7 +294,6 @@ outDatum(StringInfo str, Datum value, int typlen, bool typbyval)
#include "outfuncs.funcs.c"
-
/*
* Support functions for nodes with custom_read_write attribute or
* special_read_write attribute
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 1421686938..d57478bde9 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -623,6 +628,30 @@ readIntCols(int numCols)
return int_vals;
}
+/*
+ * readIndexCols
+ */
+Index *
+readIndexCols(int numCols)
+{
+ int tokenLength,
+ i;
+ const char *token;
+ Index *index_vals;
+
+ if (numCols <= 0)
+ return NULL;
+
+ index_vals = (Index *) palloc(numCols * sizeof(Index));
+ for (i = 0; i < numCols; i++)
+ {
+ token = pg_strtok(&tokenLength);
+ index_vals[i] = atoui(token);
+ }
+
+ return index_vals;
+}
+
/*
* readBoolCols
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b11249ed8f..7141035cc4 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,7 +519,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b8d5610593..da749e331e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d77f7d3aef..952c5b8327 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6f18b68856..16bda42f11 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1596,6 +1596,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1971,7 +1972,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1986,6 +1989,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..d1c9605979 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,38 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1875,58 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..1bbe6b704b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..e57e133f0e 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..5c4a282be0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63a89474db..12ea06c2f6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1001,6 +1001,33 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfo found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass on to the executor. The executor
+ * refers to this node when initializing the plan nodes which contain subplans
+ * that may have been pruned by ExecutorDoInitialPruning(), rather than
+ * redoing initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index cdd6debfa0..b33d9e426d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d87957ff6c..7957aeb6d7 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,19 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial (pre-exec) pruning
+ * steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f2daabb3b7..1d2c0d9bdf 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -72,8 +72,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1409,6 +1418,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1419,6 +1435,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1463,6 +1481,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..1c5bb5ece1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
[application/octet-stream] v18-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch (17.8K, 3-v18-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch)
download | inline diff:
From 571424d7f1d5cb8b3ee59853649d35731b033b03 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v18 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/nodes/outfuncs.c | 1 -
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
16 files changed, 92 insertions(+), 63 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..72fc273524 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f1fd7f7e8b..f73b8c2607 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e03ea27299..b55cdd2580 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1638,11 +1638,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..f9c7976ff2 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,8 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 4d776e7b51..81f6a9093c 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -299,7 +299,6 @@ outDatum(StringInfo str, Datum value, int typlen, bool typbyval)
* Support functions for nodes with custom_read_write attribute or
* special_read_write attribute
*/
-
static void
_outConst(StringInfo str, const Const *node)
{
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 76606faa3e..58a05cf673 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1426,7 +1426,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1519,6 +1518,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1542,13 +1544,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 06ad856eac..b11249ed8f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,6 +518,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 9cef92cab2..b8d5610593 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1655,21 +1678,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1727,21 +1741,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..d77f7d3aef 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..63a89474db 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 44ffc73f15..d87957ff6c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -480,6 +483,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index dca2a21e7a..f2daabb3b7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -69,6 +69,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -269,8 +272,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -304,8 +307,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-07-13 07:03 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-07-13 07:03 UTC (permalink / raw)
To: Jacob Champion <[email protected]>; +Cc: Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Robert Haas <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Wed, Jul 13, 2022 at 3:40 PM Amit Langote <[email protected]> wrote:
> Rebased over 964d01ae90c.
Sorry, left some pointless hunks in there while rebasing. Fixed in
the attached.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v19-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch (17.3K, 2-v19-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch)
download | inline diff:
From 9fa5cd5f4256b7249ab6f560edca9d3609a126ef Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v19 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 92 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..72fc273524 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f1fd7f7e8b..f73b8c2607 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e03ea27299..b55cdd2580 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1638,11 +1638,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..f9c7976ff2 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,8 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index e37f2933eb..fd8ab4a167 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1425,7 +1425,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1518,6 +1517,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1541,13 +1543,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 06ad856eac..b11249ed8f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,6 +518,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 9cef92cab2..b8d5610593 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1655,21 +1678,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1727,21 +1741,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..d77f7d3aef 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..63a89474db 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 44ffc73f15..d87957ff6c 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -480,6 +483,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index dca2a21e7a..f2daabb3b7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -69,6 +69,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -269,8 +272,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -304,8 +307,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
[application/octet-stream] v19-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch (80.6K, 3-v19-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch)
download | inline diff:
From b67911f2ae182f7158501e7ce4b1799ff2e1efb4 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v19 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 27 ++-
src/backend/executor/execPartition.c | 234 +++++++++++++++++++++----
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 29 +++
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 187 +++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 27 +++
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 13 ++
src/include/nodes/plannodes.h | 21 +++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
32 files changed, 782 insertions(+), 96 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index fca29a9a10..d839517693 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -541,7 +541,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e29c2ae206..e41b13a3ea 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 3db859c3ea..631cc07217 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..b0ed96e56c 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 2333aae467..83465e40f8 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..953a476ea5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed at this
+point to figure out the minimal set of child subplans that satisfy those
+pruning steps. AcquireExecutorLocks() looking at a given plan tree will then
+lock only the relations scanned by the child subplans that survived such
+pruning, along with those present in PlannedStmt.minLockRelids. Note that the
+subplans are only notionally pruned in that they are not removed from the plan
+tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a
+PartitionPruneResult node via the QueryDesc. It consists of the set of
+indexes of surviving subplans in their respective parent plan node's list of
+child subplans, saved as a list of bitmapsets, with one element for every
+parent plan node whose PartitionPruneInfo is present in
+PlannedStmt.partPruneInfos. In other words, the executor should not
+re-evaluate the set of initially valid subplans by redoing the initial pruning
+if it was already done by AcquireExecutorLocks(), because the re-evaluation may
+very well end up resulting in a different set of subplans, containing some
+whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 72fc273524..45824624f8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f73b8c2607..7e6dab5623 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b55cdd2580..24e6f6e988 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1593,8 +1599,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1611,6 +1619,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1628,8 +1643,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1645,24 +1661,59 @@ ExecInitPartitionPruning(PlanState *planstate,
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1670,7 +1721,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1686,11 +1738,73 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1704,19 +1818,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1771,15 +1887,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1793,6 +1936,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1803,6 +1947,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1853,6 +1999,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1860,6 +2008,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1881,7 +2030,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1891,7 +2040,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2119,10 +2268,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2157,7 +2310,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2171,6 +2324,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2181,13 +2336,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2214,8 +2371,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2223,7 +2386,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index f9460ae506..a2182a6b1f 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -844,7 +844,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..303a572c02 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 1421686938..d57478bde9 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -623,6 +628,30 @@ readIntCols(int numCols)
return int_vals;
}
+/*
+ * readIndexCols
+ */
+Index *
+readIndexCols(int numCols)
+{
+ int tokenLength,
+ i;
+ const char *token;
+ Index *index_vals;
+
+ if (numCols <= 0)
+ return NULL;
+
+ index_vals = (Index *) palloc(numCols * sizeof(Index));
+ for (i = 0; i < numCols; i++)
+ {
+ token = pg_strtok(&tokenLength);
+ index_vals[i] = atoui(token);
+ }
+
+ return index_vals;
+}
+
/*
* readBoolCols
*/
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b11249ed8f..7141035cc4 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,7 +519,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index b8d5610593..da749e331e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d77f7d3aef..952c5b8327 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6f18b68856..16bda42f11 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1596,6 +1596,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1971,7 +1972,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1986,6 +1989,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..d1c9605979 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,38 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1875,58 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index d549f66d4a..1bbe6b704b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..e57e133f0e 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..5c4a282be0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63a89474db..12ea06c2f6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1001,6 +1001,33 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfo found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass on to the executor. The executor
+ * refers to this node when initializing the plan nodes which contain subplans
+ * that may have been pruned by ExecutorDoInitialPruning(), rather than
+ * redoing initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index cdd6debfa0..b33d9e426d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index d87957ff6c..7957aeb6d7 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,19 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial (pre-exec) pruning
+ * steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f2daabb3b7..1d2c0d9bdf 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -72,8 +72,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1409,6 +1418,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1419,6 +1435,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1463,6 +1481,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..1c5bb5ece1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-07-27 03:00 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-07-27 03:00 UTC (permalink / raw)
To: Jacob Champion <[email protected]>; +Cc: Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Robert Haas <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Wed, Jul 13, 2022 at 4:03 PM Amit Langote <[email protected]> wrote:
> On Wed, Jul 13, 2022 at 3:40 PM Amit Langote <[email protected]> wrote:
> > Rebased over 964d01ae90c.
>
> Sorry, left some pointless hunks in there while rebasing. Fixed in
> the attached.
Needed to be rebased again, over 2d04277121f this time.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v20-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch (17.3K, 2-v20-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch)
download | inline diff:
From 8de25528e8f388beffdab3d7c9905712e2f8eeef Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v20 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 2 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 2 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 92 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index ef2fd46092..72fc273524 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f1fd7f7e8b..f73b8c2607 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index e03ea27299..b55cdd2580 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1638,11 +1638,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..f9c7976ff2 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,8 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
+ estate->es_part_prune_result = NULL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index e37f2933eb..fd8ab4a167 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1425,7 +1425,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1518,6 +1517,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1541,13 +1543,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 06ad856eac..b11249ed8f 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -518,6 +518,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1..720f20f563 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1658,21 +1681,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1734,21 +1748,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 9d3c05aed3..d77f7d3aef 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..63a89474db 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,8 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e2081db4ed..a4e6b4db92 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -488,6 +491,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index dca2a21e7a..f2daabb3b7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -69,6 +69,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -269,8 +272,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -304,8 +307,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
[application/octet-stream] v20-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch (80.5K, 3-v20-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch)
download | inline diff:
From 7a1454c6a1ecde5c871bec5a4d646da4e41a62c3 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v20 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 27 ++-
src/backend/executor/execPartition.c | 234 +++++++++++++++++++++----
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 187 +++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 27 +++
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 13 ++
src/include/nodes/plannodes.h | 21 +++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
32 files changed, 759 insertions(+), 98 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index fca29a9a10..d839517693 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -541,7 +541,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 9abbb6b555..f6607f2454 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e29c2ae206..e41b13a3ea 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 6b6720c690..374c0ff807 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..b0ed96e56c 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 579825c159..b6285958bc 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..953a476ea5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed at this
+point to figure out the minimal set of child subplans that satisfy those
+pruning steps. AcquireExecutorLocks() looking at a given plan tree will then
+lock only the relations scanned by the child subplans that survived such
+pruning, along with those present in PlannedStmt.minLockRelids. Note that the
+subplans are only notionally pruned in that they are not removed from the plan
+tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a
+PartitionPruneResult node via the QueryDesc. It consists of the set of
+indexes of surviving subplans in their respective parent plan node's list of
+child subplans, saved as a list of bitmapsets, with one element for every
+parent plan node whose PartitionPruneInfo is present in
+PlannedStmt.partPruneInfos. In other words, the executor should not
+re-evaluate the set of initially valid subplans by redoing the initial pruning
+if it was already done by AcquireExecutorLocks(), because the re-evaluation may
+very well end up resulting in a different set of subplans, containing some
+whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 72fc273524..45824624f8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f73b8c2607..7e6dab5623 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b55cdd2580..24e6f6e988 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1593,8 +1599,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1611,6 +1619,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1628,8 +1643,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1645,24 +1661,59 @@ ExecInitPartitionPruning(PlanState *planstate,
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ prunestate = NULL;
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
/* No pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1670,7 +1721,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1686,11 +1738,73 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors, which omits
+ * detached partitions, just like in the executor proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1704,19 +1818,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1771,15 +1887,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1793,6 +1936,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1803,6 +1947,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -1853,6 +1999,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -1860,6 +2008,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -1881,7 +2030,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -1891,7 +2040,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2119,10 +2268,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2157,7 +2310,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2171,6 +2324,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2181,13 +2336,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2214,8 +2371,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2223,7 +2386,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 076226868f..ed359b5153 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 29bc26669b..303a572c02 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index bee62fc15c..e7886afa35 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -542,7 +547,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index b11249ed8f..7141035cc4 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,7 +519,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 720f20f563..61d6934978 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d77f7d3aef..952c5b8327 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 078fbdb5a0..02fc5a011b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1603,6 +1603,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1978,7 +1979,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1993,6 +1996,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..d1c9605979 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,38 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ PartitionPruneResult *part_prune_result =
+ ExecutorDoInitialPruning(plannedstmt, boundParams);
+
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1875,58 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 3a161bdb88..27407a7f0f 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..e57e133f0e 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d68a6b9d28..5c4a282be0 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63a89474db..12ea06c2f6 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1001,6 +1001,33 @@ typedef struct DomainConstraintState
*/
typedef TupleTableSlot *(*ExecProcNodeMtd) (struct PlanState *pstate);
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfo found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass on to the executor. The executor
+ * refers to this node when initializing the plan nodes which contain subplans
+ * that may have been pruned by ExecutorDoInitialPruning(), rather than
+ * redoing initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
+
/* ----------------
* PlanState node
*
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index cdd6debfa0..b33d9e426d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index a4e6b4db92..86eda6c7c3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,19 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial (pre-exec) pruning
+ * steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index f2daabb3b7..1d2c0d9bdf 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -72,8 +72,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial (pre-exec) pruning
+ * steps in them? */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1409,6 +1418,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1419,6 +1435,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1463,6 +1481,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..1c5bb5ece1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-07-27 16:27 Robert Haas <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Robert Haas @ 2022-07-27 16:27 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Tue, Jul 26, 2022 at 11:01 PM Amit Langote <[email protected]> wrote:
> Needed to be rebased again, over 2d04277121f this time.
0001 adds es_part_prune_result but does not use it, so maybe the
introduction of that field should be deferred until it's needed for
something.
I wonder whether it's really necessary to added the PartitionPruneInfo
objects to a list in PlannerInfo first and then roll them up into
PlannerGlobal later. I know we do that for range table entries, but
I've never quite understood why we do it that way instead of creating
a flat range table in PlannerGlobal from the start. And so by
extension I wonder whether this table couldn't be flat from the start
also.
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-07-29 04:20 Amit Langote <[email protected]>
parent: Robert Haas <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-07-29 04:20 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Thu, Jul 28, 2022 at 1:27 AM Robert Haas <[email protected]> wrote:
> On Tue, Jul 26, 2022 at 11:01 PM Amit Langote <[email protected]> wrote:
> > Needed to be rebased again, over 2d04277121f this time.
Thanks for looking.
> 0001 adds es_part_prune_result but does not use it, so maybe the
> introduction of that field should be deferred until it's needed for
> something.
Oops, looks like a mistake when breaking the patch. Will move that bit to 0002.
> I wonder whether it's really necessary to added the PartitionPruneInfo
> objects to a list in PlannerInfo first and then roll them up into
> PlannerGlobal later. I know we do that for range table entries, but
> I've never quite understood why we do it that way instead of creating
> a flat range table in PlannerGlobal from the start. And so by
> extension I wonder whether this table couldn't be flat from the start
> also.
Tom may want to correct me but my understanding of why the planner
waits till the end of planning to start populating the PlannerGlobal
range table is that it is not until then that we know which subqueries
will be scanned by the final plan tree, so also whose range table
entries will be included in the range table passed to the executor. I
can see that subquery pull-up causes a pulled-up subquery's range
table entries to be added into the parent's query's and all its nodes
changed using OffsetVarNodes() to refer to the new RT indexes. But
for subqueries that are not pulled up, their subplans' nodes (present
in PlannerGlboal.subplans) would still refer to the original RT
indexes (per range table in the corresponding PlannerGlobal.subroot),
which must be fixed and the end of planning is the time to do so. Or
maybe that could be done when build_subplan() creates a subplan and
adds it to PlannerGlobal.subplans, but for some reason it's not?
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-07-29 04:55 Tom Lane <[email protected]>
parent: Amit Langote <[email protected]>
1 sibling, 2 replies; 82+ messages in thread
From: Tom Lane @ 2022-07-29 04:55 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
Amit Langote <[email protected]> writes:
> On Thu, Jul 28, 2022 at 1:27 AM Robert Haas <[email protected]> wrote:
>> I wonder whether it's really necessary to added the PartitionPruneInfo
>> objects to a list in PlannerInfo first and then roll them up into
>> PlannerGlobal later. I know we do that for range table entries, but
>> I've never quite understood why we do it that way instead of creating
>> a flat range table in PlannerGlobal from the start. And so by
>> extension I wonder whether this table couldn't be flat from the start
>> also.
> Tom may want to correct me but my understanding of why the planner
> waits till the end of planning to start populating the PlannerGlobal
> range table is that it is not until then that we know which subqueries
> will be scanned by the final plan tree, so also whose range table
> entries will be included in the range table passed to the executor.
It would not be profitable to flatten the range table before we've
done remove_useless_joins. We'd end up with useless entries from
subqueries that ultimately aren't there. We could perhaps do it
after we finish that phase, but I don't really see the point: it
wouldn't be better than what we do now, just the same work at a
different time.
regards, tom lane
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-07-29 12:22 Robert Haas <[email protected]>
parent: Tom Lane <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Robert Haas @ 2022-07-29 12:22 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Amit Langote <[email protected]>; Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
On Fri, Jul 29, 2022 at 12:55 AM Tom Lane <[email protected]> wrote:
> It would not be profitable to flatten the range table before we've
> done remove_useless_joins. We'd end up with useless entries from
> subqueries that ultimately aren't there. We could perhaps do it
> after we finish that phase, but I don't really see the point: it
> wouldn't be better than what we do now, just the same work at a
> different time.
That's not quite my question, though. Why do we ever build a non-flat
range table in the first place? Like, instead of assigning indexes
relative to the current subquery level, why not just assign them
relative to the whole query from the start? It can't really be that
we've done it this way because of remove_useless_joins(), because
we've been building separate range tables and later flattening them
for longer than join removal has existed as a feature.
What bugs me is that it's very much not free. By building a bunch of
separate range tables and combining them later, we generate extra
work: we have to go back and adjust RT indexes after-the-fact. We pay
that overhead for every query, not just the ones that end up with some
unused entries in the range table. And why would it matter if we did
end up with some useless entries in the range table, anyway? If
there's some semantic difference, we could add a flag to mark those
entries as needing to be ignored, which seems way better than crawling
all over the whole tree adjusting RTIs everywhere.
I don't really expect that we're ever going to change this -- and
certainly not on this thread. The idea of running around and replacing
RT indexes all over the tree is deeply embedded in the system. But are
we really sure we want to add a second kind of index that we have to
run around and adjust at the same time?
If we are, so be it, I guess. It just looks really ugly and unnecessary to me.
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-07-29 15:04 Tom Lane <[email protected]>
parent: Tom Lane <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Tom Lane @ 2022-07-29 15:04 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Amit Langote <[email protected]>; Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
Robert Haas <[email protected]> writes:
> That's not quite my question, though. Why do we ever build a non-flat
> range table in the first place? Like, instead of assigning indexes
> relative to the current subquery level, why not just assign them
> relative to the whole query from the start?
We could probably make that work, but I'm skeptical that it would
really be an improvement overall, for a couple of reasons.
(1) The need for merge-rangetables-and-renumber-Vars logic doesn't
go away. It just moves from setrefs.c to the rewriter, which would
have to do it when expanding views. This would be a net loss
performance-wise, I think, because setrefs.c can do it as part of a
parsetree scan that it has to perform anyway for other housekeeping
reasons; but the rewriter would need a brand new pass over the tree.
Admittedly that pass would only happen for view replacement, but
it's still not open-and-shut that there'd be a performance win.
(2) The need for varlevelsup and similar fields doesn't go away,
I think, because we need those for semantic purposes such as
discovering the query level that aggregates are associated with.
That means that subquery flattening still has to make a pass over
the tree to touch every Var's varlevelsup; so not having to adjust
varno at the same time would save little.
I'm not sure whether I think it's a net plus or net minus that
varno would become effectively independent of varlevelsup.
It'd be different from the way we think of them now, for sure,
and I think it'd take awhile to flush out bugs arising from such
a redefinition.
> I don't really expect that we're ever going to change this -- and
> certainly not on this thread. The idea of running around and replacing
> RT indexes all over the tree is deeply embedded in the system. But are
> we really sure we want to add a second kind of index that we have to
> run around and adjust at the same time?
You probably want to avert your eyes from [1], then ;-). Although
I'm far from convinced that the cross-list index fields currently
proposed there are actually necessary; the cost to adjust them
during rangetable merging could outweigh any benefit.
regards, tom lane
[1] https://www.postgresql.org/message-id/flat/CA+HiwqGjJDmUhDSfv-U2qhKJjt9ST7Xh9JXC_irsAQ1TAUsJYg@mail....
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-07-29 15:56 Robert Haas <[email protected]>
parent: Tom Lane <[email protected]>
0 siblings, 0 replies; 82+ messages in thread
From: Robert Haas @ 2022-07-29 15:56 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Amit Langote <[email protected]>; Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
On Fri, Jul 29, 2022 at 11:04 AM Tom Lane <[email protected]> wrote:
> We could probably make that work, but I'm skeptical that it would
> really be an improvement overall, for a couple of reasons.
>
> (1) The need for merge-rangetables-and-renumber-Vars logic doesn't
> go away. It just moves from setrefs.c to the rewriter, which would
> have to do it when expanding views. This would be a net loss
> performance-wise, I think, because setrefs.c can do it as part of a
> parsetree scan that it has to perform anyway for other housekeeping
> reasons; but the rewriter would need a brand new pass over the tree.
> Admittedly that pass would only happen for view replacement, but
> it's still not open-and-shut that there'd be a performance win.
>
> (2) The need for varlevelsup and similar fields doesn't go away,
> I think, because we need those for semantic purposes such as
> discovering the query level that aggregates are associated with.
> That means that subquery flattening still has to make a pass over
> the tree to touch every Var's varlevelsup; so not having to adjust
> varno at the same time would save little.
>
> I'm not sure whether I think it's a net plus or net minus that
> varno would become effectively independent of varlevelsup.
> It'd be different from the way we think of them now, for sure,
> and I think it'd take awhile to flush out bugs arising from such
> a redefinition.
Interesting. Thanks for your thoughts. I guess it's not as clear-cut
as I thought, but I still can't help feeling like we're doing an awful
lot of expensive rearrangement at the end of query planning.
I kind of wonder whether varlevelsup is the wrong idea. Like, suppose
we instead handed out subquery identifiers serially, sort of like what
we do with SubTransactionId values. Then instead of testing whether
varlevelsup>0 you test whether varsubqueryid==mysubqueryid. If you
flatten a query into its parent, you still need to adjust every var
that refers to the dead subquery, but you don't need to adjust vars
that refer to subqueries underneath it. Their level changes, but their
identity doesn't. Maybe that doesn't really help that much, but it's
always struck me as a little unfortunate that we basically test
whether a var is equal by testing whether the varno and varlevelsup
are equal. That only works if you assume that you can never end up
comparing two vars from thoroughly unrelated parts of the tree, such
that the subquery one level up from one might be different from the
subquery one level up from the other.
> > I don't really expect that we're ever going to change this -- and
> > certainly not on this thread. The idea of running around and replacing
> > RT indexes all over the tree is deeply embedded in the system. But are
> > we really sure we want to add a second kind of index that we have to
> > run around and adjust at the same time?
>
> You probably want to avert your eyes from [1], then ;-). Although
> I'm far from convinced that the cross-list index fields currently
> proposed there are actually necessary; the cost to adjust them
> during rangetable merging could outweigh any benefit.
I really like the idea of that patch overall, actually; I think
permissions checking is a good example of something that shouldn't
require walking the whole query tree but currently does. And actually,
I think the same thing is true here: we shouldn't need to walk the
whole query tree to find the pruning information, but right now we do.
I'm just uncertain whether what Amit has implemented is the
least-annoying way to go about it... any thoughts on that,
specifically as it pertains to this patch?
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-07-29 16:47 Tom Lane <[email protected]>
parent: Robert Haas <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Tom Lane @ 2022-07-29 16:47 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Amit Langote <[email protected]>; Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
Robert Haas <[email protected]> writes:
> ... it's
> always struck me as a little unfortunate that we basically test
> whether a var is equal by testing whether the varno and varlevelsup
> are equal. That only works if you assume that you can never end up
> comparing two vars from thoroughly unrelated parts of the tree, such
> that the subquery one level up from one might be different from the
> subquery one level up from the other.
Yeah, that's always bothered me a little as well. I've yet to see a
case where it causes a problem in practice. But I think that if, say,
we were to try to do any sort of cross-query-level optimization, then
the ambiguity could rise up to bite us. That might be a situation
where a flat rangetable would be worth the trouble.
> I'm just uncertain whether what Amit has implemented is the
> least-annoying way to go about it... any thoughts on that,
> specifically as it pertains to this patch?
I haven't looked at this patch at all. I'll try to make some
time for it, but probably not today.
regards, tom lane
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-07-29 16:55 Robert Haas <[email protected]>
parent: Tom Lane <[email protected]>
0 siblings, 0 replies; 82+ messages in thread
From: Robert Haas @ 2022-07-29 16:55 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Amit Langote <[email protected]>; Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
On Fri, Jul 29, 2022 at 12:47 PM Tom Lane <[email protected]> wrote:
> > I'm just uncertain whether what Amit has implemented is the
> > least-annoying way to go about it... any thoughts on that,
> > specifically as it pertains to this patch?
>
> I haven't looked at this patch at all. I'll try to make some
> time for it, but probably not today.
OK, thanks. The preliminary patch I'm talking about here is pretty
short, so you could probably look at that part of it, at least, in
some relatively small amount of time. And I think it's also in pretty
reasonable shape apart from this issue. But, as usual, there's the
question of how well one can evaluate a preliminary patch without
reviewing the full patch in detail.
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-10-12 07:36 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-10-12 07:36 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Fri, Jul 29, 2022 at 1:20 PM Amit Langote <[email protected]> wrote:
> On Thu, Jul 28, 2022 at 1:27 AM Robert Haas <[email protected]> wrote:
> > 0001 adds es_part_prune_result but does not use it, so maybe the
> > introduction of that field should be deferred until it's needed for
> > something.
>
> Oops, looks like a mistake when breaking the patch. Will move that bit to 0002.
Fixed that and also noticed that I had defined PartitionPruneResult in
the wrong header (execnodes.h). That led to PartitionPruneResult
nodes not being able to be written and read, because
src/backend/nodes/gen_node_support.pl doesn't create _out* and _read*
routines for the nodes defined in execnodes.h. I moved its definition
to plannodes.h, even though it is not actually the planner that
instantiates those; no other include/nodes header sounds better.
One more thing I realized is that Bitmapsets added to the List
PartitionPruneResult.valid_subplan_offs_list are not actually
read/write-able. That's a problem that I also faced in [1], so I
proposed a patch there to make Bitmapset a read/write-able Node and
mark (only) the Bitmapsets that are added into read/write-able node
trees with the corresponding NodeTag. I'm including that patch here
as well (0002) for the main patch to work (pass
-DWRITE_READ_PARSE_PLAN_TREES build tests), though it might make sense
to discuss it in its own thread?
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
[1] https://www.postgresql.org/message-id/CA%2BHiwqH80qX1ZLx3HyHmBrOzLQeuKuGx6FzGep0F_9zw9L4PAA%40mail.g...
Attachments:
[application/octet-stream] v21-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch (17.2K, 2-v21-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch)
download | inline diff:
From 06cda14113c3572440a716a4aacb250b2ed52f52 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v21 1/3] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 90 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d78862e660..32475e33ff 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 99512826c5..aca0c6f323 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..80197d5141 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1791,11 +1791,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..21f4c10937 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ab4d8e201d..2bfb817d75 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1425,7 +1425,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1518,6 +1517,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1541,13 +1543,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5d0fd6e072..31fff597a7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,6 +519,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1..720f20f563 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1658,21 +1681,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1734,21 +1748,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6188bf69cb..6565b6ed01 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..4a741b053f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6bda383bea..e392fb6fc0 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -503,6 +506,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21e642a64c..3eb3e6e527 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -70,6 +70,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -270,8 +273,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -305,8 +308,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
[application/octet-stream] v21-0003-Optimize-AcquireExecutorLocks-by-locking-only-un.patch (81.7K, 3-v21-0003-Optimize-AcquireExecutorLocks-by-locking-only-un.patch)
download | inline diff:
From ce28c4cfe8bc69e313ba7f59b048fe96f73139a6 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v21 3/3] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 55 ++++++
src/backend/executor/execParallel.c | 27 ++-
src/backend/executor/execPartition.c | 238 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 187 ++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 2 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 47 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 763 insertions(+), 100 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 2527e66059..df4b0dcf0e 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..462651910a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..219c63fa81 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, PartitionPruneResult *part_prune_result,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_result, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 6b6720c690..374c0ff807 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..b0ed96e56c 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index c4b54d0547..69e02e0346 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_result_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_result_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_result_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_result, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..953a476ea5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed at this
+point to figure out the minimal set of child subplans that satisfy those
+pruning steps. AcquireExecutorLocks() looking at a given plan tree will then
+lock only the relations scanned by the child subplans that survived such
+pruning, along with those present in PlannedStmt.minLockRelids. Note that the
+subplans are only notionally pruned in that they are not removed from the plan
+tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a
+PartitionPruneResult node via the QueryDesc. It consists of the set of
+indexes of surviving subplans in their respective parent plan node's list of
+child subplans, saved as a list of bitmapsets, with one element for every
+parent plan node whose PartitionPruneInfo is present in
+PlannedStmt.partPruneInfos. In other words, the executor should not
+re-evaluate the set of initially valid subplans by redoing the initial pruning
+if it was already done by AcquireExecutorLocks(), because the re-evaluation may
+very well end up resulting in a different set of subplans, containing some
+whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 32475e33ff..6e2cd1596f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,58 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+PartitionPruneResult *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params)
+{
+ PartitionPruneResult *result;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ result = makeNode(PartitionPruneResult);
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *valid_subplan_offs;
+
+ valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ &result->scan_leafpart_rtis);
+ if (valid_subplan_offs)
+ valid_subplan_offs->type = T_Bitmapset;
+ result->valid_subplan_offs_list =
+ lappend(result->valid_subplan_offs_list,
+ valid_subplan_offs);
+ }
+
+ return result;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +859,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ PartitionPruneResult *part_prune_result = queryDesc->part_prune_result;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +880,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_result = part_prune_result;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..abae5b8623 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITIONPRUNERESULT UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_result_data;
+ char *part_prune_result_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_result_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_result_data = nodeToString(estate->es_part_prune_result);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized PartitionPruneResult. */
+ part_prune_result_len = strlen(part_prune_result_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_result_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized PartitionPruneResult */
+ part_prune_result_space = shm_toc_allocate(pcxt->toc, part_prune_result_len);
+ memcpy(part_prune_result_space, part_prune_result_data, part_prune_result_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITIONPRUNERESULT,
+ part_prune_result_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_result_space;
char *paramspace;
PlannedStmt *pstmt;
+ PartitionPruneResult *part_prune_result;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,18 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_result_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITIONPRUNERESULT, false);
+ part_prune_result = (PartitionPruneResult *)
+ stringToNode(part_prune_result_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_result,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 80197d5141..b612c24d62 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1746,8 +1752,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1764,6 +1772,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1781,8 +1796,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1794,28 +1810,62 @@ ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = estate->es_part_prune_result;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_result
+ * has been set.
+ */
+ if (pruneresult)
+ do_pruning = pruneinfo->needs_exec_pruning;
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans =
+ list_nth(pruneresult->valid_subplan_offs_list, part_prune_index);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1823,7 +1873,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1839,11 +1890,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1857,19 +1971,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1924,15 +2040,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1946,6 +2089,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1956,6 +2100,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2006,6 +2152,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2013,6 +2161,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2034,7 +2183,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2044,7 +2193,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2272,10 +2421,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2310,7 +2463,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2324,6 +2477,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2334,13 +2489,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2367,8 +2524,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2376,7 +2539,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 21f4c10937..bb7d028463 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -134,6 +134,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_result = NULL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index e134a82ff7..901768cc34 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NULL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..b3faeae2af 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_result_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_result_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_result_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ PartitionPruneResult *part_prune_result = lfirst_node(PartitionPruneResult, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_result,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 4d6902d3ac..c34226a83b 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -799,7 +804,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 31fff597a7..4097cf7164 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 720f20f563..61d6934978 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6565b6ed01..37f3e6af61 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 27dee29f42..5a37c4160b 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_result_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_result_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_result_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_result_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..8cc2e2162d 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_result = part_prune_result; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_result: ExecutorDoInitialPruning() output for the plan tree
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_result, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results == NIL ? NULL :
+ linitial(portal->part_prune_results),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ PartitionPruneResult *part_prune_result = NULL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding PartitionPruneResult for
+ * this PlannedStmt.
+ */
+ if (portal->part_prune_results != NIL)
+ part_prune_result = list_nth(portal->part_prune_results,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_result,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..c8281e7201 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -790,15 +795,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_result_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_result_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +830,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_result_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +866,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /*
+ * The output list and any objects therein have been allocated in the
+ * caller's hopefully short-lived context, so will not remain leaked
+ * for long, though reset to avoid its accidentally being looked at.
+ */
+ *part_prune_result_list = NIL;
}
/*
@@ -874,10 +899,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NULLs is returned in *part_prune_result_list, meaning that no
+ * PartitionPruneResult nodes have yet been created for the plans in
+ * stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1037,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_result_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
+ }
+
return plan;
}
@@ -1126,6 +1167,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a PartitionPruneResult or a NULL is added to
+ * *part_prune_result_list if needed. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and contains at least one
+ * PartitionPruneInfo that has "initial" pruning steps. Those steps are
+ * performed by calling ExecutorDoInitialPruning() to determine only those
+ * leaf partitions that need to be locked by AcquireExecutorLocks() by pruning
+ * away subplans that don't match the pruning conditions. The
+ * PartitionPruneResult contains a list of bitmapsets of the indexes of
+ * matching subplans, one for each PartitionPruneInfo.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1191,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_result_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_result_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1214,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_result_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1224,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_result_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1270,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_result_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1303,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_result_list)
+ *part_prune_result_list = my_part_prune_result_list;
+
return plan;
}
@@ -1737,17 +1797,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_result_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_result_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_result_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ PartitionPruneResult *part_prune_result = NULL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1833,37 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_result_list = lappend(*part_prune_result_list, NULL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_result = ExecutorDoInitialPruning(plannedstmt,
+ boundParams);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ part_prune_result->scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1874,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_result_list = lappend(*part_prune_result_list,
+ part_prune_result);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 3a161bdb88..27407a7f0f 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given list of PartitionPruneResults into the portal's
+ * context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results = copyObject(part_prune_results);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..e57e133f0e 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..60d5644908 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ PartitionPruneResult *part_prune_result; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ PartitionPruneResult *part_prune_result,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..6ae897d5d1 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern PartitionPruneResult *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4a741b053f..63a89474db 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -612,6 +612,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ struct PartitionPruneResult *es_part_prune_result; /* QueryDesc.part_prune_result */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index a80f43e540..937cc4629d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e392fb6fc0..494ae461be 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 3eb3e6e527..a1e06719e6 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1410,6 +1419,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1420,6 +1436,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1464,6 +1482,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1548,6 +1569,32 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecutorDoInitialPruning() invocation on a given
+ * PlannedStmt.
+ *
+ * Contains a list of Bitmapset of the indexes of the subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() for every
+ * PartitionPruneInfo found in PlannedStmt.partPruneInfos. RT indexes of the
+ * leaf partitions scanned by those subplans across all PartitionPruneInfos
+ * are added into scan_leafpart_rtis.
+ *
+ * This is used by GetCachedPlan() to inform its callers of the pruning
+ * decisions made when performing AcquireExecutorLocks() on a given cached
+ * PlannedStmt, which the callers then pass on to the executor. The executor
+ * refers to this node when initializing the plan nodes which contain subplans
+ * that may have been pruned by ExecutorDoInitialPruning(), rather than
+ * redoing initial pruning.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ List *valid_subplan_offs_list;
+ Bitmapset *scan_leafpart_rtis;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..1c5bb5ece1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_result_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..9f7727a837 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results; /* list of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_result_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
[application/octet-stream] v21-0002-Allow-adding-Bitmapsets-as-Nodes-into-plan-trees.patch (5.5K, 4-v21-0002-Allow-adding-Bitmapsets-as-Nodes-into-plan-trees.patch)
download | inline diff:
From 41465f94e426a0b22b070ab8034de19cfdb6daa4 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Thu, 6 Oct 2022 17:31:37 +0900
Subject: [PATCH v21 2/3] Allow adding Bitmapsets as Nodes into plan trees
Note that this only adds some infrastructure bits and none of the
existing bitmapsets that are added to plan trees have been changed
to instead add the Node version. So, the plan trees, or really the
bitmapsets contained in them, look the same as before as far as
Node write/read functionality is concerned.
This is needed, because it is not currently possible to write and
then read back Bitmapsets that are not direct members of write/read
capable Nodes; for example, if one needs to add a List of Bitmapsets
to a plan tree. The most straightforward way to do that is to make
Bitmapsets be written with outNode() and read with nodeRead().
---
src/backend/nodes/Makefile | 3 ++-
src/backend/nodes/copyfuncs.c | 11 +++++++++++
src/backend/nodes/equalfuncs.c | 6 ++++++
src/backend/nodes/gen_node_support.pl | 1 +
src/backend/nodes/outfuncs.c | 11 +++++++++++
src/backend/nodes/readfuncs.c | 4 ++++
src/backend/optimizer/prep/preptlist.c | 1 -
src/include/nodes/bitmapset.h | 5 +++++
src/include/nodes/meson.build | 1 +
9 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/src/backend/nodes/Makefile b/src/backend/nodes/Makefile
index 7450e191ee..da5307771b 100644
--- a/src/backend/nodes/Makefile
+++ b/src/backend/nodes/Makefile
@@ -57,7 +57,8 @@ node_headers = \
nodes/replnodes.h \
nodes/supportnodes.h \
nodes/value.h \
- utils/rel.h
+ utils/rel.h \
+ nodes/bitmapset.h
# see also catalog/Makefile for an explanation of these make rules
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index e76fda8eba..1482019327 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -160,6 +160,17 @@ _copyExtensibleNode(const ExtensibleNode *from)
return newnode;
}
+/* Custom copy routine for Node bitmapsets */
+static Bitmapset *
+_copyBitmapset(const Bitmapset *from)
+{
+ Bitmapset *newnode = bms_copy(from);
+
+ newnode->type = T_Bitmapset;
+
+ return newnode;
+}
+
/*
* copyObjectImpl -- implementation of copyObject(); see nodes/nodes.h
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 0373aa30fe..e8706c461a 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -210,6 +210,12 @@ _equalList(const List *a, const List *b)
return true;
}
+/* Custom equal routine for Node bitmapsets */
+static bool
+_equalBitmapset(const Bitmapset *a, const Bitmapset *b)
+{
+ return bms_equal(a, b);
+}
/*
* equal
diff --git a/src/backend/nodes/gen_node_support.pl b/src/backend/nodes/gen_node_support.pl
index 81b8c184a9..ccb5aff874 100644
--- a/src/backend/nodes/gen_node_support.pl
+++ b/src/backend/nodes/gen_node_support.pl
@@ -71,6 +71,7 @@ my @all_input_files = qw(
nodes/supportnodes.h
nodes/value.h
utils/rel.h
+ nodes/bitmapset.h
);
# Nodes from these input files are automatically treated as nodetag_only.
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 64c65f060b..b3ffd8cec2 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -328,6 +328,17 @@ outBitmapset(StringInfo str, const Bitmapset *bms)
appendStringInfoChar(str, ')');
}
+/* Custom write routine for Node bitmapsets */
+static void
+_outBitmapset(StringInfo str, const Bitmapset *bms)
+{
+ Assert(IsA(bms, Bitmapset));
+ WRITE_NODE_TYPE("BITMAPSET");
+
+ outBitmapset(str, bms);
+}
+
+
/*
* Print the value of a Datum given its type.
*/
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index b4ff855f7c..4d6902d3ac 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -230,6 +230,10 @@ _readBitmapset(void)
result = bms_add_member(result, val);
}
+ /* XXX maybe do `result = makeNode(Bitmapset);` at the top? */
+ if (result)
+ result->type = T_Bitmapset;
+
return result;
}
diff --git a/src/backend/optimizer/prep/preptlist.c b/src/backend/optimizer/prep/preptlist.c
index 137b28323d..e5c1103316 100644
--- a/src/backend/optimizer/prep/preptlist.c
+++ b/src/backend/optimizer/prep/preptlist.c
@@ -337,7 +337,6 @@ extract_update_targetlist_colnos(List *tlist)
return update_colnos;
}
-
/*****************************************************************************
*
* TARGETLIST EXPANSION
diff --git a/src/include/nodes/bitmapset.h b/src/include/nodes/bitmapset.h
index 75b5ce1a8e..9046ca177f 100644
--- a/src/include/nodes/bitmapset.h
+++ b/src/include/nodes/bitmapset.h
@@ -20,6 +20,8 @@
#ifndef BITMAPSET_H
#define BITMAPSET_H
+#include "nodes/nodes.h"
+
/*
* Forward decl to save including pg_list.h
*/
@@ -48,6 +50,9 @@ typedef int32 signedbitmapword; /* must be the matching signed type */
typedef struct Bitmapset
{
+ pg_node_attr(custom_copy_equal, custom_read_write)
+
+ NodeTag type;
int nwords; /* number of words in array */
bitmapword words[FLEXIBLE_ARRAY_MEMBER]; /* really [nwords] */
} Bitmapset;
diff --git a/src/include/nodes/meson.build b/src/include/nodes/meson.build
index b7df232081..94701af8e1 100644
--- a/src/include/nodes/meson.build
+++ b/src/include/nodes/meson.build
@@ -19,6 +19,7 @@ node_support_input_i = [
'nodes/supportnodes.h',
'nodes/value.h',
'utils/rel.h',
+ 'nodes/bitmapset.h',
]
node_support_input = []
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-10-17 09:29 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-10-17 09:29 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Wed, Oct 12, 2022 at 4:36 PM Amit Langote <[email protected]> wrote:
> On Fri, Jul 29, 2022 at 1:20 PM Amit Langote <[email protected]> wrote:
> > On Thu, Jul 28, 2022 at 1:27 AM Robert Haas <[email protected]> wrote:
> > > 0001 adds es_part_prune_result but does not use it, so maybe the
> > > introduction of that field should be deferred until it's needed for
> > > something.
> >
> > Oops, looks like a mistake when breaking the patch. Will move that bit to 0002.
>
> Fixed that and also noticed that I had defined PartitionPruneResult in
> the wrong header (execnodes.h). That led to PartitionPruneResult
> nodes not being able to be written and read, because
> src/backend/nodes/gen_node_support.pl doesn't create _out* and _read*
> routines for the nodes defined in execnodes.h. I moved its definition
> to plannodes.h, even though it is not actually the planner that
> instantiates those; no other include/nodes header sounds better.
>
> One more thing I realized is that Bitmapsets added to the List
> PartitionPruneResult.valid_subplan_offs_list are not actually
> read/write-able. That's a problem that I also faced in [1], so I
> proposed a patch there to make Bitmapset a read/write-able Node and
> mark (only) the Bitmapsets that are added into read/write-able node
> trees with the corresponding NodeTag. I'm including that patch here
> as well (0002) for the main patch to work (pass
> -DWRITE_READ_PARSE_PLAN_TREES build tests), though it might make sense
> to discuss it in its own thread?
Had second thoughts on the use of List of Bitmapsets for this, such
that the make-Bitmapset-Nodes patch is no longer needed.
I had defined PartitionPruneResult such that it stood for the results
of pruning for all PartitionPruneInfos contained in
PlannedStmt.partPruneInfos (covering all Append/MergeAppend nodes that
can use partition pruning in a given plan). So, it had a List of
Bitmapset. I think it's perhaps better for PartitionPruneResult to
cover only one PartitionPruneInfo and thus need only a Bitmapset and
not a List thereof, which I have implemented in the attached updated
patch 0002. So, instead of needing to pass around a
PartitionPruneResult with each PlannedStmt, this now passes a List of
PartitionPruneResult with an entry for each in
PlannedStmt.partPruneInfos.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v22-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch (17.2K, 2-v22-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch)
download | inline diff:
From 27db8ab066dace77953d71a6446788190b66ce60 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v22 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 90 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d78862e660..32475e33ff 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 99512826c5..aca0c6f323 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..80197d5141 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1791,11 +1791,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..21f4c10937 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ac86ce9003..50a5719ac6 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1425,7 +1425,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1518,6 +1517,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1541,13 +1543,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5d0fd6e072..31fff597a7 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,6 +519,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1..720f20f563 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1658,21 +1681,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1734,21 +1748,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6188bf69cb..6565b6ed01 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..4a741b053f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 6bda383bea..e392fb6fc0 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -503,6 +506,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 21e642a64c..3eb3e6e527 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -70,6 +70,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
@@ -270,8 +273,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -305,8 +308,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
[application/octet-stream] v22-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch (82.3K, 3-v22-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch)
download | inline diff:
From 5f2d5ca36111f8007a7850fd985c7e965d621149 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v22 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 51 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 241 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 208 ++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 782 insertions(+), 100 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 2527e66059..fb8779fec0 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 6b6720c690..06dfcd4d84 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index c4b54d0547..b469e05672 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..f14f9197b5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed to
+figure out the minimal set of child subplans that satisfy those pruning steps.
+AcquireExecutorLocks() looking at a given generic plan will then lock only the
+relations scanned by the child subplans that survived such pruning, along with
+those present in PlannedStmt.minLockRelids. Note that the subplans are only
+notionally pruned, that is, they are not removed from the plan tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a List
+of PartitionPruneResult nodes via the QueryDesc. Each PartitionPruneResult
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset (valid_subplan_offs). In other
+words, the executor executing a generic plan should not re-evaluate the set of
+initially valid subplans for a given plan node by redoing the initial pruning
+if it was already done by AcquireExecutorLocks() when validating the plan.
+Such re-evaluation of the pruning steps may very well end up resulting in a
+different set of subplans, containing some whose relations were not locked by
+AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 32475e33ff..b59474841f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,54 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +855,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +876,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 80197d5141..8728745c44 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1746,8 +1752,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1764,6 +1772,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1781,8 +1796,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1794,28 +1810,65 @@ ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = NULL;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
+
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
+ Assert(IsA(pruneresult, PartitionPruneResult));
+ do_pruning = pruneinfo->needs_exec_pruning;
+ }
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1823,7 +1876,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1839,11 +1893,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1857,19 +1974,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1924,15 +2043,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1946,6 +2092,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1956,6 +2103,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2006,6 +2155,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2013,6 +2164,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2034,7 +2186,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2044,7 +2196,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2272,10 +2424,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2310,7 +2466,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2324,6 +2480,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2334,13 +2492,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2367,8 +2527,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2376,7 +2542,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 21f4c10937..67a58c7163 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -134,6 +134,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index e134a82ff7..18d3b98cdc 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index b4ff855f7c..77990a2732 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -795,7 +800,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 31fff597a7..4097cf7164 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 720f20f563..61d6934978 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6565b6ed01..37f3e6af61 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a9a1851c94..a1be8179e8 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..226ee81b63 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth(portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..957221c47e 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst(lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 3a161bdb88..4b156de524 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..c9a5e5fb68 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4a741b053f..521a60b988 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -612,6 +612,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index a80f43e540..937cc4629d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index e392fb6fc0..494ae461be 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 3eb3e6e527..0bc4c8130a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1410,6 +1419,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1420,6 +1436,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1464,6 +1482,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1548,6 +1569,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-10-27 02:41 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-10-27 02:41 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Mon, Oct 17, 2022 at 6:29 PM Amit Langote <[email protected]> wrote:
> On Wed, Oct 12, 2022 at 4:36 PM Amit Langote <[email protected]> wrote:
> > On Fri, Jul 29, 2022 at 1:20 PM Amit Langote <[email protected]> wrote:
> > > On Thu, Jul 28, 2022 at 1:27 AM Robert Haas <[email protected]> wrote:
> > > > 0001 adds es_part_prune_result but does not use it, so maybe the
> > > > introduction of that field should be deferred until it's needed for
> > > > something.
> > >
> > > Oops, looks like a mistake when breaking the patch. Will move that bit to 0002.
> >
> > Fixed that and also noticed that I had defined PartitionPruneResult in
> > the wrong header (execnodes.h). That led to PartitionPruneResult
> > nodes not being able to be written and read, because
> > src/backend/nodes/gen_node_support.pl doesn't create _out* and _read*
> > routines for the nodes defined in execnodes.h. I moved its definition
> > to plannodes.h, even though it is not actually the planner that
> > instantiates those; no other include/nodes header sounds better.
> >
> > One more thing I realized is that Bitmapsets added to the List
> > PartitionPruneResult.valid_subplan_offs_list are not actually
> > read/write-able. That's a problem that I also faced in [1], so I
> > proposed a patch there to make Bitmapset a read/write-able Node and
> > mark (only) the Bitmapsets that are added into read/write-able node
> > trees with the corresponding NodeTag. I'm including that patch here
> > as well (0002) for the main patch to work (pass
> > -DWRITE_READ_PARSE_PLAN_TREES build tests), though it might make sense
> > to discuss it in its own thread?
>
> Had second thoughts on the use of List of Bitmapsets for this, such
> that the make-Bitmapset-Nodes patch is no longer needed.
>
> I had defined PartitionPruneResult such that it stood for the results
> of pruning for all PartitionPruneInfos contained in
> PlannedStmt.partPruneInfos (covering all Append/MergeAppend nodes that
> can use partition pruning in a given plan). So, it had a List of
> Bitmapset. I think it's perhaps better for PartitionPruneResult to
> cover only one PartitionPruneInfo and thus need only a Bitmapset and
> not a List thereof, which I have implemented in the attached updated
> patch 0002. So, instead of needing to pass around a
> PartitionPruneResult with each PlannedStmt, this now passes a List of
> PartitionPruneResult with an entry for each in
> PlannedStmt.partPruneInfos.
Rebased over 3b2db22fe.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v23-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch (17.2K, 2-v23-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch)
download | inline diff:
From c805965cadc12217406309221e2c89e3c17be433 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v23 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 90 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d78862e660..32475e33ff 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 99512826c5..aca0c6f323 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..80197d5141 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1791,11 +1791,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..21f4c10937 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ac86ce9003..50a5719ac6 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1425,7 +1425,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1518,6 +1517,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1541,13 +1543,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 78a8174534..240d50f1c0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,6 +519,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1..720f20f563 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1658,21 +1681,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1734,21 +1748,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6188bf69cb..6565b6ed01 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..4a741b053f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 09342d128d..fbe75dca0f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -503,6 +506,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 5c2ab1b379..2e132afc5a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -70,6 +70,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
@@ -270,8 +273,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -305,8 +308,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
[application/octet-stream] v23-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch (82.3K, 3-v23-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch)
download | inline diff:
From ae9a6b7186c77888fd85dd7e4056dd3cd607617c Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v23 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 51 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 241 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 208 ++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 782 insertions(+), 100 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 2527e66059..fb8779fec0 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1a62e5dac5..cc36b6fd15 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..29b45539d3 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..f14f9197b5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed to
+figure out the minimal set of child subplans that satisfy those pruning steps.
+AcquireExecutorLocks() looking at a given generic plan will then lock only the
+relations scanned by the child subplans that survived such pruning, along with
+those present in PlannedStmt.minLockRelids. Note that the subplans are only
+notionally pruned, that is, they are not removed from the plan tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a List
+of PartitionPruneResult nodes via the QueryDesc. Each PartitionPruneResult
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset (valid_subplan_offs). In other
+words, the executor executing a generic plan should not re-evaluate the set of
+initially valid subplans for a given plan node by redoing the initial pruning
+if it was already done by AcquireExecutorLocks() when validating the plan.
+Such re-evaluation of the pruning steps may very well end up resulting in a
+different set of subplans, containing some whose relations were not locked by
+AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 32475e33ff..b59474841f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,54 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +855,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +876,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 80197d5141..8728745c44 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1746,8 +1752,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1764,6 +1772,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1781,8 +1796,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1794,28 +1810,65 @@ ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = NULL;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
+
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
+ Assert(IsA(pruneresult, PartitionPruneResult));
+ do_pruning = pruneinfo->needs_exec_pruning;
+ }
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1823,7 +1876,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1839,11 +1893,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1857,19 +1974,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1924,15 +2043,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1946,6 +2092,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1956,6 +2103,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2006,6 +2155,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2013,6 +2164,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2034,7 +2186,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2044,7 +2196,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2272,10 +2424,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2310,7 +2466,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2324,6 +2480,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2334,13 +2492,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2367,8 +2527,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2376,7 +2542,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 21f4c10937..67a58c7163 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -134,6 +134,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index e134a82ff7..18d3b98cdc 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index b4ff855f7c..77990a2732 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -795,7 +800,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 240d50f1c0..b7801ea04c 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 720f20f563..61d6934978 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6565b6ed01..37f3e6af61 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a9a1851c94..a1be8179e8 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 5aa5a350f3..226ee81b63 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth(portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 0d6a295674..957221c47e 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst(lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index c3e95346b6..74950bd163 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ AssertArg(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..c9a5e5fb68 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4a741b053f..521a60b988 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -612,6 +612,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index a80f43e540..937cc4629d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index fbe75dca0f..354c2e96c3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 2e132afc5a..c0717bf45e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1410,6 +1419,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1420,6 +1436,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1464,6 +1482,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1548,6 +1569,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-11-08 06:22 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-11-08 06:22 UTC (permalink / raw)
To: Robert Haas <[email protected]>; +Cc: Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Thu, Oct 27, 2022 at 11:41 AM Amit Langote <[email protected]> wrote:
> On Mon, Oct 17, 2022 at 6:29 PM Amit Langote <[email protected]> wrote:
> > On Wed, Oct 12, 2022 at 4:36 PM Amit Langote <[email protected]> wrote:
> > > On Fri, Jul 29, 2022 at 1:20 PM Amit Langote <[email protected]> wrote:
> > > > On Thu, Jul 28, 2022 at 1:27 AM Robert Haas <[email protected]> wrote:
> > > > > 0001 adds es_part_prune_result but does not use it, so maybe the
> > > > > introduction of that field should be deferred until it's needed for
> > > > > something.
> > > >
> > > > Oops, looks like a mistake when breaking the patch. Will move that bit to 0002.
> > >
> > > Fixed that and also noticed that I had defined PartitionPruneResult in
> > > the wrong header (execnodes.h). That led to PartitionPruneResult
> > > nodes not being able to be written and read, because
> > > src/backend/nodes/gen_node_support.pl doesn't create _out* and _read*
> > > routines for the nodes defined in execnodes.h. I moved its definition
> > > to plannodes.h, even though it is not actually the planner that
> > > instantiates those; no other include/nodes header sounds better.
> > >
> > > One more thing I realized is that Bitmapsets added to the List
> > > PartitionPruneResult.valid_subplan_offs_list are not actually
> > > read/write-able. That's a problem that I also faced in [1], so I
> > > proposed a patch there to make Bitmapset a read/write-able Node and
> > > mark (only) the Bitmapsets that are added into read/write-able node
> > > trees with the corresponding NodeTag. I'm including that patch here
> > > as well (0002) for the main patch to work (pass
> > > -DWRITE_READ_PARSE_PLAN_TREES build tests), though it might make sense
> > > to discuss it in its own thread?
> >
> > Had second thoughts on the use of List of Bitmapsets for this, such
> > that the make-Bitmapset-Nodes patch is no longer needed.
> >
> > I had defined PartitionPruneResult such that it stood for the results
> > of pruning for all PartitionPruneInfos contained in
> > PlannedStmt.partPruneInfos (covering all Append/MergeAppend nodes that
> > can use partition pruning in a given plan). So, it had a List of
> > Bitmapset. I think it's perhaps better for PartitionPruneResult to
> > cover only one PartitionPruneInfo and thus need only a Bitmapset and
> > not a List thereof, which I have implemented in the attached updated
> > patch 0002. So, instead of needing to pass around a
> > PartitionPruneResult with each PlannedStmt, this now passes a List of
> > PartitionPruneResult with an entry for each in
> > PlannedStmt.partPruneInfos.
>
> Rebased over 3b2db22fe.
Updated 0002 to cope with AssertArg() being removed from the tree.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v24-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch (82.3K, 2-v24-0002-Optimize-AcquireExecutorLocks-by-locking-only-un.patch)
download | inline diff:
From 8f6456d27efb8719a7dd8a52bf0ad3c5033b31a3 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v24 2/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 51 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 241 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 208 ++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 782 insertions(+), 100 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 1a62e5dac5..cc36b6fd15 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -776,7 +776,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..29b45539d3 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 0b5183fc4a..f14f9197b5 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed to
+figure out the minimal set of child subplans that satisfy those pruning steps.
+AcquireExecutorLocks() looking at a given generic plan will then lock only the
+relations scanned by the child subplans that survived such pruning, along with
+those present in PlannedStmt.minLockRelids. Note that the subplans are only
+notionally pruned, that is, they are not removed from the plan tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a List
+of PartitionPruneResult nodes via the QueryDesc. Each PartitionPruneResult
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset (valid_subplan_offs). In other
+words, the executor executing a generic plan should not re-evaluate the set of
+initially valid subplans for a given plan node by redoing the initial pruning
+if it was already done by AcquireExecutorLocks() when validating the plan.
+Such re-evaluation of the pruning steps may very well end up resulting in a
+different set of subplans, containing some whose relations were not locked by
+AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 32475e33ff..b59474841f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,54 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +855,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +876,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 80197d5141..8728745c44 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1746,8 +1752,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1764,6 +1772,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1781,8 +1796,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1794,28 +1810,65 @@ ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ PartitionPruneResult *pruneresult = NULL;
+ bool do_pruning = (pruneinfo->needs_init_pruning ||
+ pruneinfo->needs_exec_pruning);
+
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
+ Assert(IsA(pruneresult, PartitionPruneResult));
+ do_pruning = pruneinfo->needs_exec_pruning;
+ }
- /* We may need an expression context to evaluate partition exprs */
- ExecAssignExprContext(estate, planstate);
+ if (do_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL, true,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1823,7 +1876,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1839,11 +1893,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1857,19 +1974,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1924,15 +2043,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1946,6 +2092,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1956,6 +2103,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2006,6 +2155,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2013,6 +2164,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2034,7 +2186,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2044,7 +2196,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2272,10 +2424,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2310,7 +2466,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2324,6 +2480,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2334,13 +2492,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2367,8 +2527,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2376,7 +2542,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 21f4c10937..67a58c7163 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -134,6 +134,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index e134a82ff7..18d3b98cdc 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..96880e122a 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -155,7 +155,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -577,7 +578,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -642,7 +643,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -717,7 +718,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -868,7 +869,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..2312e5a633 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -103,7 +103,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -218,7 +219,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index b4ff855f7c..77990a2732 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -795,7 +800,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 799602f5ea..a96d316dca 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 720f20f563..61d6934978 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
foreach(l, pruneinfo->prune_infos)
@@ -362,15 +373,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6565b6ed01..37f3e6af61 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -341,6 +352,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -441,13 +454,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -458,6 +476,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -545,6 +567,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -619,6 +644,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -646,6 +677,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -658,6 +690,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -672,6 +705,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -696,6 +730,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..95ab1d0eef 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..280ed7d239 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth(portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..af6fae6e3b 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst(lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..5b9098971b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ Assert(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..bd8776402e 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -126,5 +128,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..c9a5e5fb68 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 4a741b053f..521a60b988 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -612,6 +612,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index a80f43e540..937cc4629d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index fbe75dca0f..354c2e96c3 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 2e132afc5a..c0717bf45e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1410,6 +1419,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1420,6 +1436,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1464,6 +1482,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1548,6 +1569,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
[application/octet-stream] v24-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch (17.2K, 3-v24-0001-Move-PartitioPruneInfo-out-of-plan-nodes-into-Pl.patch)
download | inline diff:
From 9819109681e87342bf22549f5ea316501f77235d Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Fri, 27 May 2022 16:00:28 +0900
Subject: [PATCH v24 1/2] Move PartitioPruneInfo out of plan nodes into
PlannedStmt
The planner will now add a given PartitioPruneInfo to
PlannedStmt.partPruneInfos instead of directly to the
Append/MergeAppend plan node. What gets set instead in the
latter is an index field which points to the list element
of PlannedStmt.partPruneInfos containing the PartitioPruneInfo
belonging to the plan node.
A later commit will make AcquireExecutorLocks() do the initial
partition pruning to determine a minimal set of partitions to be
locked when validating a plan tree and it will need to consult the
PartitioPruneInfos referenced therein to do so. It would be better
for the PartitioPruneInfos to be accessible directly than requiring
a walk of the plan tree to find them, which is easier when it can be
done by simply iterating over PlannedStmt.partPruneInfos.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 4 +-
src/backend/executor/execUtils.c | 1 +
src/backend/executor/nodeAppend.c | 4 +-
src/backend/executor/nodeMergeAppend.c | 4 +-
src/backend/optimizer/plan/createplan.c | 24 ++++-----
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 65 +++++++++++++------------
src/backend/partitioning/partprune.c | 18 ++++---
src/include/executor/execPartition.h | 3 +-
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 6 +++
src/include/nodes/plannodes.h | 11 +++--
src/include/partitioning/partprune.h | 8 +--
15 files changed, 90 insertions(+), 62 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index d78862e660..32475e33ff 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -825,6 +825,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
ExecInitRangeTable(estate, rangeTable);
estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 99512826c5..aca0c6f323 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -183,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
pstmt->planTree = plan;
+ pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
pstmt->resultRelations = NIL;
pstmt->appendRelations = NIL;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 40e3c07693..80197d5141 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1791,11 +1791,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
EState *estate = planstate->state;
+ PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
+ part_prune_index);
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9df1f81ea8..21f4c10937 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -119,6 +119,7 @@ CreateExecutorState(void)
estate->es_relations = NULL;
estate->es_rowmarks = NULL;
estate->es_plannedstmt = NULL;
+ estate->es_part_prune_infos = NIL;
estate->es_junkFilter = NULL;
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 357e10a1d7..c6f86a6510 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -134,7 +134,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
appendstate->as_begun = false;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -145,7 +145,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index c5c62fa5c7..8d35860c30 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -82,7 +82,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
mergestate->ps.ExecProcNode = ExecMergeAppend;
/* If run-time partition pruning is enabled, then set that up now */
- if (node->part_prune_info != NULL)
+ if (node->part_prune_index >= 0)
{
PartitionPruneState *prunestate;
@@ -93,7 +93,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
*/
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
- node->part_prune_info,
+ node->part_prune_index,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ac86ce9003..50a5719ac6 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -1203,7 +1203,6 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
ListCell *subpaths;
int nasyncplans = 0;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
int nodenumsortkeys = 0;
AttrNumber *nodeSortColIdx = NULL;
Oid *nodeSortOperators = NULL;
@@ -1354,6 +1353,9 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ plan->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1377,16 +1379,14 @@ create_append_plan(PlannerInfo *root, AppendPath *best_path, int flags)
}
if (prunequal != NIL)
- partpruneinfo =
- make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ plan->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
plan->appendplans = subplans;
plan->nasyncplans = nasyncplans;
plan->first_partial_plan = best_path->first_partial_path;
- plan->part_prune_info = partpruneinfo;
copy_generic_path_info(&plan->plan, (Path *) best_path);
@@ -1425,7 +1425,6 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
List *subplans = NIL;
ListCell *subpaths;
RelOptInfo *rel = best_path->path.parent;
- PartitionPruneInfo *partpruneinfo = NULL;
/*
* We don't have the actual creation of the MergeAppend node split out
@@ -1518,6 +1517,9 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
subplans = lappend(subplans, subplan);
}
+ /* Set below if we find quals that we can use to run-time prune */
+ node->part_prune_index = -1;
+
/*
* If any quals exist, they may be useful to perform further partition
* pruning during execution. Gather information needed by the executor to
@@ -1541,13 +1543,13 @@ create_merge_append_plan(PlannerInfo *root, MergeAppendPath *best_path,
}
if (prunequal != NIL)
- partpruneinfo = make_partition_pruneinfo(root, rel,
- best_path->subpaths,
- prunequal);
+ node->part_prune_index = make_partition_pruneinfo(root, rel,
+ best_path->subpaths,
+ prunequal);
}
node->mergeplans = subplans;
- node->part_prune_info = partpruneinfo;
+
/*
* If prepare_sort_from_pathkeys added sort columns, but we were told to
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 493a3af0fa..799602f5ea 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -519,6 +519,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->dependsOnRole = glob->dependsOnRole;
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
+ result->partPruneInfos = glob->partPruneInfos;
result->rtable = glob->finalrtable;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1cb0abdbc1..720f20f563 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -348,6 +348,29 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /* Also fix up the information in PartitionPruneInfos. */
+ foreach (lc, root->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ ListCell *l;
+
+ foreach(l, pruneinfo->prune_infos)
+ {
+ List *prune_infos = lfirst(l);
+ ListCell *l2;
+
+ foreach(l2, prune_infos)
+ {
+ PartitionedRelPruneInfo *pinfo = lfirst(l2);
+
+ /* RT index of the table to which the pinfo belongs. */
+ pinfo->rtindex += rtoffset;
+ }
+ }
+
+ glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ }
+
return result;
}
@@ -1658,21 +1681,12 @@ set_append_references(PlannerInfo *root,
aplan->apprelids = offset_relid_set(aplan->apprelids, rtoffset);
- if (aplan->part_prune_info)
- {
- foreach(l, aplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (aplan->part_prune_index >= 0)
+ aplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(aplan->plan.lefttree == NULL);
@@ -1734,21 +1748,12 @@ set_mergeappend_references(PlannerInfo *root,
mplan->apprelids = offset_relid_set(mplan->apprelids, rtoffset);
- if (mplan->part_prune_info)
- {
- foreach(l, mplan->part_prune_info->prune_infos)
- {
- List *prune_infos = lfirst(l);
- ListCell *l2;
-
- foreach(l2, prune_infos)
- {
- PartitionedRelPruneInfo *pinfo = lfirst(l2);
-
- pinfo->rtindex += rtoffset;
- }
- }
- }
+ /*
+ * PartitionPruneInfos will be added to a list in PlannerGlobal, so update
+ * the index.
+ */
+ if (mplan->part_prune_index >= 0)
+ mplan->part_prune_index += list_length(root->glob->partPruneInfos);
/* We don't need to recurse to lefttree or righttree ... */
Assert(mplan->plan.lefttree == NULL);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6188bf69cb..6565b6ed01 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -209,16 +209,20 @@ static void partkey_datum_from_expr(PartitionPruneContext *context,
/*
* make_partition_pruneinfo
- * Builds a PartitionPruneInfo which can be used in the executor to allow
- * additional partition pruning to take place. Returns NULL when
- * partition pruning would be useless.
+ * Checks if the given set of quals can be used to build pruning steps
+ * that the executor can use to prune away unneeded partitions. If
+ * suitable quals are found then a PartitionPruneInfo is built and tagged
+ * onto the PlannerInfo's partPruneInfos list.
+ *
+ * The return value is the 0-based index of the item added to the
+ * partPruneInfos list or -1 if nothing was added.
*
* 'parentrel' is the RelOptInfo for an appendrel, and 'subpaths' is the list
* of scan paths for its child rels.
* 'prunequal' is a list of potential pruning quals (i.e., restriction
* clauses that are applicable to the appendrel).
*/
-PartitionPruneInfo *
+int
make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *subpaths,
List *prunequal)
@@ -332,7 +336,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* quals, then we can just not bother with run-time pruning.
*/
if (prunerelinfos == NIL)
- return NULL;
+ return -1;
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
@@ -358,7 +362,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
else
pruneinfo->other_subplans = NULL;
- return pruneinfo;
+ root->partPruneInfos = lappend(root->partPruneInfos, pruneinfo);
+
+ return list_length(root->partPruneInfos) - 1;
}
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 708435e952..bf962af7af 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -123,9 +123,8 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
- PartitionPruneInfo *pruneinfo,
+ int part_prune_index,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
-
#endif /* EXECPARTITION_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 01b1727fc0..4a741b053f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -611,6 +611,7 @@ typedef struct EState
struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
+ List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 09342d128d..fbe75dca0f 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -122,6 +122,9 @@ typedef struct PlannerGlobal
/* "flat" list of AppendRelInfos */
List *appendRelations;
+ /* List of PartitionPruneInfo contained in the plan */
+ List *partPruneInfos;
+
/* OIDs of relations the plan depends on */
List *relationOids;
@@ -503,6 +506,9 @@ struct PlannerInfo
/* Does this query modify any partition key columns? */
bool partColsUpdated;
+
+ /* PartitionPruneInfos added in this query's plan. */
+ List *partPruneInfos;
};
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 5c2ab1b379..2e132afc5a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -70,6 +70,9 @@ typedef struct PlannedStmt
struct Plan *planTree; /* tree of Plan nodes */
+ List *partPruneInfos; /* List of PartitionPruneInfo contained in
+ * the plan */
+
List *rtable; /* list of RangeTblEntry nodes */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
@@ -270,8 +273,8 @@ typedef struct Append
*/
int first_partial_plan;
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} Append;
/* ----------------
@@ -305,8 +308,8 @@ typedef struct MergeAppend
/* NULLS FIRST/LAST directions */
bool *nullsFirst pg_node_attr(array_size(numCols));
- /* Info for run-time subplan pruning; NULL if we're not doing that */
- struct PartitionPruneInfo *part_prune_info;
+ /* Index to PlannerInfo.partPruneInfos or -1 if no run-time pruning */
+ int part_prune_index;
} MergeAppend;
/* ----------------
diff --git a/src/include/partitioning/partprune.h b/src/include/partitioning/partprune.h
index 90684efa25..ebf0dcff8c 100644
--- a/src/include/partitioning/partprune.h
+++ b/src/include/partitioning/partprune.h
@@ -70,10 +70,10 @@ typedef struct PartitionPruneContext
#define PruneCxtStateIdx(partnatts, step_id, keyno) \
((partnatts) * (step_id) + (keyno))
-extern PartitionPruneInfo *make_partition_pruneinfo(struct PlannerInfo *root,
- struct RelOptInfo *parentrel,
- List *subpaths,
- List *prunequal);
+extern int make_partition_pruneinfo(struct PlannerInfo *root,
+ struct RelOptInfo *parentrel,
+ List *subpaths,
+ List *prunequal);
extern Bitmapset *prune_append_rel_partitions(struct RelOptInfo *rel);
extern Bitmapset *get_matching_partitions(PartitionPruneContext *context,
List *pruning_steps);
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-11-30 18:12 Alvaro Herrera <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Alvaro Herrera @ 2022-11-30 18:12 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
Looking at 0001, I wonder if we should have a crosscheck that a
PartitionPruneInfo you got from following an index is indeed constructed
for the relation that you think it is: previously, you were always sure
that the prune struct is for this node, because you followed a pointer
that was set up in the node itself. Now you only have an index, and you
have to trust that the index is correct.
I'm not sure how to implement this, or even if it's doable at all.
Keeping the OID of the partitioned table in the PartitionPruneInfo
struct is easy, but I don't know how to check it in ExecInitMergeAppend
and ExecInitAppend.
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Find a bug in a program, and fix it, and the program will work today.
Show the program how to find and fix a bug, and the program
will work forever" (Oliver Silfridge)
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-01 07:59 Amit Langote <[email protected]>
parent: Alvaro Herrera <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-12-01 07:59 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
Hi Alvaro,
Thanks for looking at this one.
On Thu, Dec 1, 2022 at 3:12 AM Alvaro Herrera <[email protected]> wrote:
> Looking at 0001, I wonder if we should have a crosscheck that a
> PartitionPruneInfo you got from following an index is indeed constructed
> for the relation that you think it is: previously, you were always sure
> that the prune struct is for this node, because you followed a pointer
> that was set up in the node itself. Now you only have an index, and you
> have to trust that the index is correct.
Yeah, a crosscheck sounds like a good idea.
> I'm not sure how to implement this, or even if it's doable at all.
> Keeping the OID of the partitioned table in the PartitionPruneInfo
> struct is easy, but I don't know how to check it in ExecInitMergeAppend
> and ExecInitAppend.
Hmm, how about keeping the [Merge]Append's parent relation's RT index
in the PartitionPruneInfo and passing it down to
ExecInitPartitionPruning() from ExecInit[Merge]Append() for
cross-checking? Both Append and MergeAppend already have a
'apprelids' field that we can save a copy of in the
PartitionPruneInfo. Tried that in the attached delta patch.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] PartitionPruneInfo-relids.patch (5.3K, 2-PartitionPruneInfo-relids.patch)
download | inline diff:
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 2bd069d889..9a631a9192 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1791,6 +1791,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* Initialize data structure needed for run-time partition pruning and
* do initial pruning if needed
*
+ * 'root_parent_relids' identifies the relation to which both the parent plan
+ * and the PartitionPruneInfo given by 'part_prune_index' belong.
+ *
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
* Initial pruning is performed here if needed and in that case only the
@@ -1804,6 +1807,7 @@ PartitionPruneState *
ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
+ Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
PartitionPruneState *prunestate;
@@ -1811,6 +1815,14 @@ ExecInitPartitionPruning(PlanState *planstate,
PartitionPruneInfo *pruneinfo = list_nth(estate->es_part_prune_infos,
part_prune_index);
+ /* Sanity: part_prune_index gives the correct PartitionPruneInfo. */
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ elog(ERROR, "wrong relids (%s) found in PartitionPruneInfo at part_prune_index=%u which has root_parent_relids=%s",
+ bmsToString(root_parent_relids),
+ part_prune_index,
+ bmsToString(pruneinfo->root_parent_relids));
+
+
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index c6f86a6510..99830198bd 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -146,6 +146,7 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
prunestate = ExecInitPartitionPruning(&appendstate->ps,
list_length(node->appendplans),
node->part_prune_index,
+ node->apprelids,
&validsubplans);
appendstate->as_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 8d35860c30..f370f9f287 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -94,6 +94,7 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
prunestate = ExecInitPartitionPruning(&mergestate->ps,
list_length(node->mergeplans),
node->part_prune_index,
+ node->apprelids,
&validsubplans);
mergestate->ms_prune_state = prunestate;
nplans = bms_num_members(validsubplans);
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 720f20f563..e67f0e3509 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -354,6 +354,8 @@ set_plan_references(PlannerInfo *root, Plan *plan)
PartitionPruneInfo *pruneinfo = lfirst(lc);
ListCell *l;
+ pruneinfo->root_parent_relids =
+ offset_relid_set(pruneinfo->root_parent_relids, rtoffset);
foreach(l, pruneinfo->prune_infos)
{
List *prune_infos = lfirst(l);
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index 6565b6ed01..d48f6784c1 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -340,6 +340,7 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
/* Else build the result data structure */
pruneinfo = makeNode(PartitionPruneInfo);
+ pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
/*
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index bf962af7af..17fabc18c9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -124,6 +124,7 @@ typedef struct PartitionPruneState
extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
+ Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
bool initial_prune);
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 2e132afc5a..b2d6f8fb6e 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1407,6 +1407,8 @@ typedef struct PlanRowMark
* Then, since an Append-type node could have multiple partitioning
* hierarchies among its children, we have an unordered List of those Lists.
*
+ * root_parent_relids RelOptInfo.relids of the relation to which the parent
+ * plan node and this PartitionPruneInfo node belong
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
@@ -1419,6 +1421,7 @@ typedef struct PartitionPruneInfo
pg_node_attr(no_equal)
NodeTag type;
+ Bitmapset *root_parent_relids;
List *prune_infos;
Bitmapset *other_subplans;
} PartitionPruneInfo;
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-01 11:21 Alvaro Herrera <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Alvaro Herrera @ 2022-12-01 11:21 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On 2022-Dec-01, Amit Langote wrote:
> Hmm, how about keeping the [Merge]Append's parent relation's RT index
> in the PartitionPruneInfo and passing it down to
> ExecInitPartitionPruning() from ExecInit[Merge]Append() for
> cross-checking? Both Append and MergeAppend already have a
> 'apprelids' field that we can save a copy of in the
> PartitionPruneInfo. Tried that in the attached delta patch.
Ah yeah, that sounds about what I was thinking. I've merged that in and
pushed to github, which had a strange pg_upgrade failure on Windows
mentioning log files that were not captured by the CI tooling. So I
pushed another one trying to grab those files, in case it wasn't an
one-off failure. It's running now:
https://cirrus-ci.com/task/5857239638999040
If all goes well with this run, I'll get this 0001 pushed.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Investigación es lo que hago cuando no sé lo que estoy haciendo"
(Wernher von Braun)
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-01 12:43 Amit Langote <[email protected]>
parent: Alvaro Herrera <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-12-01 12:43 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Thu, Dec 1, 2022 at 8:21 PM Alvaro Herrera <[email protected]> wrote:
> On 2022-Dec-01, Amit Langote wrote:
> > Hmm, how about keeping the [Merge]Append's parent relation's RT index
> > in the PartitionPruneInfo and passing it down to
> > ExecInitPartitionPruning() from ExecInit[Merge]Append() for
> > cross-checking? Both Append and MergeAppend already have a
> > 'apprelids' field that we can save a copy of in the
> > PartitionPruneInfo. Tried that in the attached delta patch.
>
> Ah yeah, that sounds about what I was thinking. I've merged that in and
> pushed to github, which had a strange pg_upgrade failure on Windows
> mentioning log files that were not captured by the CI tooling. So I
> pushed another one trying to grab those files, in case it wasn't an
> one-off failure. It's running now:
> https://cirrus-ci.com/task/5857239638999040
>
> If all goes well with this run, I'll get this 0001 pushed.
Thanks for pushing 0001.
Rebased 0002 attached.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v25-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch (82.4K, 2-v25-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch)
download | inline diff:
From cff400af6c264d7a2651faec4d963e987797f588 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v25] Optimize AcquireExecutorLocks() by locking only unpruned
partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 51 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 238 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 208 ++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 781 insertions(+), 98 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index cf1b1ca571..904cbcba4a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -779,7 +779,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..29b45539d3 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..5c59ac5da7 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed to
+figure out the minimal set of child subplans that satisfy those pruning steps.
+AcquireExecutorLocks() looking at a given generic plan will then lock only the
+relations scanned by the child subplans that survived such pruning, along with
+those present in PlannedStmt.minLockRelids. Note that the subplans are only
+notionally pruned, that is, they are not removed from the plan tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a List
+of PartitionPruneResult nodes via the QueryDesc. Each PartitionPruneResult
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset (valid_subplan_offs). In other
+words, the executor executing a generic plan should not re-evaluate the set of
+initially valid subplans for a given plan node by redoing the initial pruning
+if it was already done by AcquireExecutorLocks() when validating the plan.
+Such re-evaluation of the pruning steps may very well end up resulting in a
+different set of subplans, containing some whose relations were not locked by
+AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b6751da574..7a4db80104 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,54 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +855,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +876,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 8e6453aec2..13e450c0fa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1758,8 +1764,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1776,6 +1784,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1796,8 +1811,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1810,9 +1826,10 @@ ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo;
+ PartitionPruneResult *pruneresult = NULL;
/* Obtain the pruneinfo we need, and make sure it's the right one */
pruneinfo = list_nth(estate->es_part_prune_infos, part_prune_index);
@@ -1828,20 +1845,57 @@ ExecInitPartitionPruning(PlanState *planstate,
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
+ Assert(IsA(pruneresult, PartitionPruneResult));
+ }
+
+ if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL,
+ pruneinfo->needs_exec_pruning,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1849,7 +1903,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1865,11 +1920,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1883,19 +2001,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1950,15 +2070,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1972,6 +2119,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1982,6 +2130,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2032,6 +2182,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2039,6 +2191,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2060,7 +2213,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2070,7 +2223,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2298,10 +2451,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2336,7 +2493,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2350,6 +2507,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2360,13 +2519,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2393,8 +2554,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2402,7 +2569,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9695de85b9..dce93a8c9f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -135,6 +135,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index dc13625171..bffb42ce71 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 99830198bd..3b917584de 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -156,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -578,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -643,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -718,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -869,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f370f9f287..ccfa083945 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -104,7 +104,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -219,7 +220,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 23776367c5..b01f55fb4f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -800,7 +805,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 799602f5ea..a96d316dca 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index e67f0e3509..5820f26fdb 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
pruneinfo->root_parent_relids =
@@ -364,15 +375,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d48f6784c1..d5556354f7 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -342,6 +353,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -442,13 +455,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -459,6 +477,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -546,6 +568,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -620,6 +645,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -647,6 +678,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -659,6 +691,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -673,6 +706,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -697,6 +731,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..95ab1d0eef 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..280ed7d239 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth(portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..af6fae6e3b 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst(lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..5b9098971b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ Assert(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 17fabc18c9..4b98d0d2ef 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -127,5 +129,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..c9a5e5fb68 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a2008846c6..369de42caf 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -615,6 +615,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index a80f43e540..937cc4629d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index dd4eb8679d..36abe4cf9e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 2e202892a7..0cab6958d7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1414,6 +1423,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1425,6 +1441,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
Bitmapset *root_parent_relids;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1469,6 +1487,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1553,6 +1574,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-02 10:40 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-12-02 10:40 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; Zhihong Yu <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Thu, Dec 1, 2022 at 9:43 PM Amit Langote <[email protected]> wrote:
> On Thu, Dec 1, 2022 at 8:21 PM Alvaro Herrera <[email protected]> wrote:
> > On 2022-Dec-01, Amit Langote wrote:
> > > Hmm, how about keeping the [Merge]Append's parent relation's RT index
> > > in the PartitionPruneInfo and passing it down to
> > > ExecInitPartitionPruning() from ExecInit[Merge]Append() for
> > > cross-checking? Both Append and MergeAppend already have a
> > > 'apprelids' field that we can save a copy of in the
> > > PartitionPruneInfo. Tried that in the attached delta patch.
> >
> > Ah yeah, that sounds about what I was thinking. I've merged that in and
> > pushed to github, which had a strange pg_upgrade failure on Windows
> > mentioning log files that were not captured by the CI tooling. So I
> > pushed another one trying to grab those files, in case it wasn't an
> > one-off failure. It's running now:
> > https://cirrus-ci.com/task/5857239638999040
> >
> > If all goes well with this run, I'll get this 0001 pushed.
>
> Thanks for pushing 0001.
>
> Rebased 0002 attached.
Thought it might be good for PartitionPruneResult to also have
root_parent_relids that matches with the corresponding
PartitionPruneInfo. ExecInitPartitionPruning() does a sanity check
that the root_parent_relids of a given pair of PartitionPrune{Info |
Result} match.
Posting the patch separately as the attached 0002, just in case you
might think that the extra cross-checking would be an overkill.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v26-0002-Add-root_parent_relids-to-PartitionPruneResult.patch (3.4K, 2-v26-0002-Add-root_parent_relids-to-PartitionPruneResult.patch)
download | inline diff:
From f1af32816635254773386630b634835bd26d1227 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Fri, 2 Dec 2022 19:32:14 +0900
Subject: [PATCH v26 2/2] Add root_parent_relids to PartitionPruneResult
It's same as the corresponding PartitionPruneInfo's root_parent_relids.
Like PartitionPruneInfo.root_parent_relids, it's there for
cross-checking a PartitionPruneResult found at a given plan node's
part_prune_index actually matches the plan node.
---
src/backend/executor/execMain.c | 2 ++
src/backend/executor/execPartition.c | 13 +++++++++++--
src/include/nodes/plannodes.h | 7 +++++++
3 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7a4db80104..1e84e47d46 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -145,6 +145,8 @@ ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
PartitionPruneInfo *pruneinfo = lfirst(lc);
PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+ pruneresult->root_parent_relids =
+ bms_copy(pruneinfo->root_parent_relids);
pruneresult->valid_subplan_offs =
ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
scan_leafpart_rtis);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 13e450c0fa..eda14d6241 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1852,8 +1852,17 @@ ExecInitPartitionPruning(PlanState *planstate,
*/
if (estate->es_part_prune_results)
{
- pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
- Assert(IsA(pruneresult, PartitionPruneResult));
+ pruneresult = list_nth_node(PartitionPruneResult,
+ estate->es_part_prune_results,
+ part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo and PartitionPruneResult at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("prunresult relids %s, pruneinfo relids %s",
+ bmsToString(pruneresult->root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
}
if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 0cab6958d7..30f51414e9 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1580,6 +1580,12 @@ typedef struct PartitionPruneStepCombine
* The result of performing ExecPartitionDoInitialPruning() on a given
* PartitionPruneInfo.
*
+ * root_parent_relids is same as PartitionPruneInfo.root_parent_relids. It's
+ * there for cross-checking in ExecInitPartitionPruning() that the
+ * PartitionPruneResult and the PartitionPruneInfo at a given index in
+ * EState.es_part_prune_results and EState.es_part_prune_infos, respectively,
+ * belong to the same parent plan node.
+ *
* valid_subplans_offs contains the indexes of subplans remaining after
* performing initial pruning by calling ExecFindMatchingSubPlans() on the
* PartitionPruneInfo.
@@ -1597,6 +1603,7 @@ typedef struct PartitionPruneResult
{
NodeTag type;
+ Bitmapset *root_parent_relids;
Bitmapset *valid_subplan_offs;
} PartitionPruneResult;
--
2.35.3
[application/octet-stream] v26-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch (82.5K, 3-v26-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch)
download | inline diff:
From d8b8185b6ceb2a2a33a6af142f23a59fd93d5cdc Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v26 1/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 32 ++++
src/backend/executor/execMain.c | 51 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 238 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 208 ++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 781 insertions(+), 98 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index cf1b1ca571..904cbcba4a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -779,7 +779,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..29b45539d3 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..5c59ac5da7 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,34 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+Actually, the so-called execution time pruning may also occur even before the
+execution has started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c: GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed to
+figure out the minimal set of child subplans that satisfy those pruning steps.
+AcquireExecutorLocks() looking at a given generic plan will then lock only the
+relations scanned by the child subplans that survived such pruning, along with
+those present in PlannedStmt.minLockRelids. Note that the subplans are only
+notionally pruned, that is, they are not removed from the plan tree as such.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of pruning is passed to the executor as a List
+of PartitionPruneResult nodes via the QueryDesc. Each PartitionPruneResult
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset (valid_subplan_offs). In other
+words, the executor executing a generic plan should not re-evaluate the set of
+initially valid subplans for a given plan node by redoing the initial pruning
+if it was already done by AcquireExecutorLocks() when validating the plan.
+Such re-evaluation of the pruning steps may very well end up resulting in a
+different set of subplans, containing some whose relations were not locked by
+AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +314,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b6751da574..7a4db80104 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,54 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a PartitionPruneResult node that contains a list of those
+ * bitmapsets, with one element for every PartitionPruneInfo, and a bitmapset
+ * of the RT indexes of all the leaf partitions scanned by those chosen
+ * subplans. Note that the latter is shared across all PartitionPruneInfos.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +855,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +876,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 8e6453aec2..13e450c0fa 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1758,8 +1764,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1776,6 +1784,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1796,8 +1811,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1810,9 +1826,10 @@ ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo;
+ PartitionPruneResult *pruneresult = NULL;
/* Obtain the pruneinfo we need, and make sure it's the right one */
pruneinfo = list_nth(estate->es_part_prune_infos, part_prune_index);
@@ -1828,20 +1845,57 @@ ExecInitPartitionPruning(PlanState *planstate,
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
+ Assert(IsA(pruneresult, PartitionPruneResult));
+ }
+
+ if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL,
+ pruneinfo->needs_exec_pruning,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1849,7 +1903,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1865,11 +1920,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1883,19 +2001,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1950,15 +2070,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1972,6 +2119,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1982,6 +2130,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2032,6 +2182,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2039,6 +2191,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2060,7 +2213,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2070,7 +2223,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2298,10 +2451,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2336,7 +2493,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2350,6 +2507,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2360,13 +2519,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2393,8 +2554,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2402,7 +2569,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9695de85b9..dce93a8c9f 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -135,6 +135,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index dc13625171..bffb42ce71 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 99830198bd..3b917584de 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -156,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -578,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -643,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -718,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -869,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f370f9f287..ccfa083945 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -104,7 +104,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -219,7 +220,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 23776367c5..b01f55fb4f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -800,7 +805,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 799602f5ea..a96d316dca 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index e67f0e3509..5820f26fdb 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -352,6 +362,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach (lc, root->partPruneInfos)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
+ Bitmapset *leafpart_rtis = NULL;
ListCell *l;
pruneinfo->root_parent_relids =
@@ -364,15 +375,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d48f6784c1..d5556354f7 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -342,6 +353,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -442,13 +455,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -459,6 +477,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -546,6 +568,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -620,6 +645,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -647,6 +678,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -659,6 +691,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -673,6 +706,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -697,6 +731,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..95ab1d0eef 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..280ed7d239 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth(portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..af6fae6e3b 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst(lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..5b9098971b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ Assert(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 17fabc18c9..4b98d0d2ef 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -127,5 +129,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ed95ed1176..c9a5e5fb68 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index a2008846c6..369de42caf 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -615,6 +615,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index a80f43e540..937cc4629d 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -212,6 +212,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index dd4eb8679d..36abe4cf9e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 2e202892a7..0cab6958d7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in
* the plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1414,6 +1423,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1425,6 +1441,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
Bitmapset *root_parent_relids;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1469,6 +1487,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1553,6 +1574,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-05 03:00 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-12-05 03:00 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Fri, Dec 2, 2022 at 7:40 PM Amit Langote <[email protected]> wrote:
> On Thu, Dec 1, 2022 at 9:43 PM Amit Langote <[email protected]> wrote:
> > On Thu, Dec 1, 2022 at 8:21 PM Alvaro Herrera <[email protected]> wrote:
> > > On 2022-Dec-01, Amit Langote wrote:
> > > > Hmm, how about keeping the [Merge]Append's parent relation's RT index
> > > > in the PartitionPruneInfo and passing it down to
> > > > ExecInitPartitionPruning() from ExecInit[Merge]Append() for
> > > > cross-checking? Both Append and MergeAppend already have a
> > > > 'apprelids' field that we can save a copy of in the
> > > > PartitionPruneInfo. Tried that in the attached delta patch.
> > >
> > > Ah yeah, that sounds about what I was thinking. I've merged that in and
> > > pushed to github, which had a strange pg_upgrade failure on Windows
> > > mentioning log files that were not captured by the CI tooling. So I
> > > pushed another one trying to grab those files, in case it wasn't an
> > > one-off failure. It's running now:
> > > https://cirrus-ci.com/task/5857239638999040
> > >
> > > If all goes well with this run, I'll get this 0001 pushed.
> >
> > Thanks for pushing 0001.
> >
> > Rebased 0002 attached.
>
> Thought it might be good for PartitionPruneResult to also have
> root_parent_relids that matches with the corresponding
> PartitionPruneInfo. ExecInitPartitionPruning() does a sanity check
> that the root_parent_relids of a given pair of PartitionPrune{Info |
> Result} match.
>
> Posting the patch separately as the attached 0002, just in case you
> might think that the extra cross-checking would be an overkill.
Rebased over 92c4dafe1eed and fixed some factual mistakes in the
comment above ExecutorDoInitialPruning().
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v27-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch (82.9K, 2-v27-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch)
download | inline diff:
From 6c4cf0b0a03bfac62e87f76bb3be9c1e62125a0c Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v27 1/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 36 ++++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 238 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 208 ++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 787 insertions(+), 98 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index cf1b1ca571..904cbcba4a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -779,7 +779,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..29b45539d3 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..7f8cf1494f 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,38 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+The so-called execution time pruning may also occur even before the execution
+has actually started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c:GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed as part
+of the plan validation step, by calling ExecutorDoInitialPruning(). That
+returns the minimal set of child subplans that satisfy thoe initial pruning
+steps contained in each PartitionPruneInfo. AcquireExecutorLocks() will then
+lock only the relations scanned by those subplans, in addition to those present
+inPlannedStmt.minLockRelids. Note that the subplans are not really pruned as
+in being removed from the plan tree, so care is needed by the downstreams
+users of such a plan that has undergone pre-execution initial pruning.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of that pruning is passed to the executor as a
+List of PartitionPruneResult nodes via the QueryDesc, which is subsequently
+assigned to EState.es_part_prune_results. Each PartitionPruneResult therein
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset valid_subplan_offs. The executor
+or any third party execution code working on a generic plan should not
+re-evaluate the set of initially valid subplans for a given plan node by
+redoing the initial pruning if a PartitionPruneResult belonging to thant plan
+node is present in es_part_prune_results. Note that that is not simply a
+performance optimization, because such re-evaluation of the pruning steps may
+very well end up resulting in a different set of initially valid subplans,
+containing some whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +318,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 12ff4f3de5..4d8c8e2e43 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a List of PartitionPruneResult nodes, one for each
+ * PartitionPruneInfo found in plannedstmt->containsInitialPruning, each
+ * containing a bitmapset of the indexes of unpruned child subplans.
+ * A bitmapset of the RT indexes of the leaf partitions scanned by those
+ * subplans is returned in *scan_leafpart_rtis, which is shared across all
+ * of those PartitionPruneResults.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst(lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 88d0ea3adb..b0eb15b982 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1749,8 +1755,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1767,6 +1775,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1787,8 +1802,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1801,9 +1817,10 @@ ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo;
+ PartitionPruneResult *pruneresult = NULL;
/* Obtain the pruneinfo we need, and make sure it's the right one */
pruneinfo = list_nth(estate->es_part_prune_infos, part_prune_index);
@@ -1819,20 +1836,57 @@ ExecInitPartitionPruning(PlanState *planstate,
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
+ Assert(IsA(pruneresult, PartitionPruneResult));
+ }
+
+ if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL,
+ pruneinfo->needs_exec_pruning,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1840,7 +1894,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1856,11 +1911,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1874,19 +1992,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1941,15 +2061,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1963,6 +2110,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1973,6 +2121,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2023,6 +2173,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2030,6 +2182,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2051,7 +2204,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2061,7 +2214,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2289,10 +2442,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2327,7 +2484,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2341,6 +2498,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2351,13 +2510,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2384,8 +2545,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2393,7 +2560,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 572c87e453..044bf3f491 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -135,6 +135,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index dc13625171..bffb42ce71 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 99830198bd..3b917584de 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -156,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -578,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -643,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -718,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -869,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f370f9f287..ccfa083945 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -104,7 +104,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -219,7 +220,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 23776367c5..b01f55fb4f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -800,7 +805,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 799602f5ea..a96d316dca 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 399c1812d4..44ffe71c49 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -353,6 +363,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
ListCell *l;
+ Bitmapset *leafpart_rtis = NULL;
pruneinfo->root_parent_relids =
offset_relid_set(pruneinfo->root_parent_relids, rtoffset);
@@ -364,15 +375,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d48f6784c1..d5556354f7 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -342,6 +353,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -442,13 +455,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -459,6 +477,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -546,6 +568,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -620,6 +645,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -647,6 +678,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -659,6 +691,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -673,6 +706,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -697,6 +731,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..95ab1d0eef 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..280ed7d239 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,18 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth(portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..af6fae6e3b 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst(lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..5b9098971b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ Assert(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 17fabc18c9..4b98d0d2ef 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -127,5 +129,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index aaf2bc78b9..32bbbc5927 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 71248a9466..9c6e8f5e13 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 1f33902947..c2f2544df5 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -218,6 +218,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index dbaa9bb54d..e0e5c15b09 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c36a15bd09..714e2cf2c7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in the
* plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1414,6 +1423,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1425,6 +1441,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
Bitmapset *root_parent_relids;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1469,6 +1487,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1553,6 +1574,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
[application/octet-stream] v27-0002-Add-root_parent_relids-to-PartitionPruneResult.patch (3.4K, 3-v27-0002-Add-root_parent_relids-to-PartitionPruneResult.patch)
download | inline diff:
From 4ef1d918405a7c7c63a3e7376ccef57cf844796d Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Fri, 2 Dec 2022 19:32:14 +0900
Subject: [PATCH v27 2/2] Add root_parent_relids to PartitionPruneResult
It's same as the corresponding PartitionPruneInfo's root_parent_relids.
Like PartitionPruneInfo.root_parent_relids, it's there for
cross-checking a PartitionPruneResult found at a given plan node's
part_prune_index actually matches the plan node.
---
src/backend/executor/execMain.c | 2 ++
src/backend/executor/execPartition.c | 13 +++++++++++--
src/include/nodes/plannodes.h | 7 +++++++
3 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4d8c8e2e43..3293a65d15 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -147,6 +147,8 @@ ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
PartitionPruneInfo *pruneinfo = lfirst(lc);
PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+ pruneresult->root_parent_relids =
+ bms_copy(pruneinfo->root_parent_relids);
pruneresult->valid_subplan_offs =
ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
scan_leafpart_rtis);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index b0eb15b982..2eadc30ec8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1843,8 +1843,17 @@ ExecInitPartitionPruning(PlanState *planstate,
*/
if (estate->es_part_prune_results)
{
- pruneresult = list_nth(estate->es_part_prune_results, part_prune_index);
- Assert(IsA(pruneresult, PartitionPruneResult));
+ pruneresult = list_nth_node(PartitionPruneResult,
+ estate->es_part_prune_results,
+ part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo and PartitionPruneResult at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("prunresult relids %s, pruneinfo relids %s",
+ bmsToString(pruneresult->root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
}
if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 714e2cf2c7..ed664c5469 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1580,6 +1580,12 @@ typedef struct PartitionPruneStepCombine
* The result of performing ExecPartitionDoInitialPruning() on a given
* PartitionPruneInfo.
*
+ * root_parent_relids is same as PartitionPruneInfo.root_parent_relids. It's
+ * there for cross-checking in ExecInitPartitionPruning() that the
+ * PartitionPruneResult and the PartitionPruneInfo at a given index in
+ * EState.es_part_prune_results and EState.es_part_prune_infos, respectively,
+ * belong to the same parent plan node.
+ *
* valid_subplans_offs contains the indexes of subplans remaining after
* performing initial pruning by calling ExecFindMatchingSubPlans() on the
* PartitionPruneInfo.
@@ -1597,6 +1603,7 @@ typedef struct PartitionPruneResult
{
NodeTag type;
+ Bitmapset *root_parent_relids;
Bitmapset *valid_subplan_offs;
} PartitionPruneResult;
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-05 06:08 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-12-05 06:08 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Mon, Dec 5, 2022 at 12:00 PM Amit Langote <[email protected]> wrote:
> On Fri, Dec 2, 2022 at 7:40 PM Amit Langote <[email protected]> wrote:
> > Thought it might be good for PartitionPruneResult to also have
> > root_parent_relids that matches with the corresponding
> > PartitionPruneInfo. ExecInitPartitionPruning() does a sanity check
> > that the root_parent_relids of a given pair of PartitionPrune{Info |
> > Result} match.
> >
> > Posting the patch separately as the attached 0002, just in case you
> > might think that the extra cross-checking would be an overkill.
>
> Rebased over 92c4dafe1eed and fixed some factual mistakes in the
> comment above ExecutorDoInitialPruning().
Sorry, I had forgotten to git-add hunks including some cosmetic
changes in that one. Here's another version.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v28-0002-Add-root_parent_relids-to-PartitionPruneResult.patch (3.3K, 2-v28-0002-Add-root_parent_relids-to-PartitionPruneResult.patch)
download | inline diff:
From 04f156396309f8c34a853ce1ad4e293fe4e2c4a2 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Fri, 2 Dec 2022 19:32:14 +0900
Subject: [PATCH v28 2/2] Add root_parent_relids to PartitionPruneResult
It's same as the corresponding PartitionPruneInfo's root_parent_relids.
Like PartitionPruneInfo.root_parent_relids, it's there for
cross-checking a PartitionPruneResult found at a given plan node's
part_prune_index actually matches the plan node.
---
src/backend/executor/execMain.c | 2 ++
src/backend/executor/execPartition.c | 10 ++++++++++
src/include/nodes/plannodes.h | 7 +++++++
3 files changed, 19 insertions(+)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f15265716a..554623751b 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -147,6 +147,8 @@ ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+ pruneresult->root_parent_relids =
+ bms_copy(pruneinfo->root_parent_relids);
pruneresult->valid_subplan_offs =
ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
scan_leafpart_rtis);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index bc8331a222..2eadc30ec8 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1842,9 +1842,19 @@ ExecInitPartitionPruning(PlanState *planstate,
* is set.
*/
if (estate->es_part_prune_results)
+ {
pruneresult = list_nth_node(PartitionPruneResult,
estate->es_part_prune_results,
part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo and PartitionPruneResult at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("prunresult relids %s, pruneinfo relids %s",
+ bmsToString(pruneresult->root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
+ }
if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
{
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 714e2cf2c7..ed664c5469 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -1580,6 +1580,12 @@ typedef struct PartitionPruneStepCombine
* The result of performing ExecPartitionDoInitialPruning() on a given
* PartitionPruneInfo.
*
+ * root_parent_relids is same as PartitionPruneInfo.root_parent_relids. It's
+ * there for cross-checking in ExecInitPartitionPruning() that the
+ * PartitionPruneResult and the PartitionPruneInfo at a given index in
+ * EState.es_part_prune_results and EState.es_part_prune_infos, respectively,
+ * belong to the same parent plan node.
+ *
* valid_subplans_offs contains the indexes of subplans remaining after
* performing initial pruning by calling ExecFindMatchingSubPlans() on the
* PartitionPruneInfo.
@@ -1597,6 +1603,7 @@ typedef struct PartitionPruneResult
{
NodeTag type;
+ Bitmapset *root_parent_relids;
Bitmapset *valid_subplan_offs;
} PartitionPruneResult;
--
2.35.3
[application/octet-stream] v28-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch (83.0K, 3-v28-0001-Optimize-AcquireExecutorLocks-by-locking-only-un.patch)
download | inline diff:
From 28bdd07ae15228bc3173257ab5968864455dda16 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v28 1/2] Optimize AcquireExecutorLocks() by locking only
unpruned partitions
This commit teaches AcquireExecutorLocks() to perform initial
partition pruning to notionally eliminate the subnodes contained in a
generic cached plan that need not be initialized during the actual
execution of the plan and skip locking the partition scanned by those
subnodes.
The result of performing initial partition pruning this way before the
actual execution has started is made available to the actual execution via
PartitionPruneResult, made available along with the PlannedStmt by the
callers of the executor that used plancache.c to get the plan. It is NULL
in the cases in which the plan is obtained by calling the planner
directly or if the plan obtained by plancache.c is not a generic one.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 26 ++-
src/backend/executor/README | 36 ++++
src/backend/executor/execMain.c | 53 ++++++
src/backend/executor/execParallel.c | 26 ++-
src/backend/executor/execPartition.c | 237 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 27 ++-
src/backend/nodes/readfuncs.c | 8 +-
src/backend/optimizer/plan/planner.c | 2 +
src/backend/optimizer/plan/setrefs.c | 46 +++++
src/backend/partitioning/partprune.c | 41 ++++-
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 29 ++-
src/backend/utils/cache/plancache.c | 208 +++++++++++++++++++---
src/backend/utils/mmgr/portalmem.c | 19 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 9 +-
src/include/executor/execdesc.h | 3 +
src/include/executor/executor.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 12 ++
src/include/nodes/plannodes.h | 46 +++++
src/include/utils/plancache.h | 3 +-
src/include/utils/portal.h | 3 +
33 files changed, 787 insertions(+), 98 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index cf1b1ca571..904cbcba4a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -779,7 +779,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..29b45539d3 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -155,6 +155,7 @@ ExecuteQuery(ParseState *pstate,
PreparedStatement *entry;
CachedPlan *cplan;
List *plan_list;
+ List *part_prune_results_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
Portal portal;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +211,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -576,7 +583,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
- ListCell *p;
+ List *part_prune_results_list;
+ ListCell *p,
+ *pp;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -619,7 +628,10 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -634,13 +646,15 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
- foreach(p, plan_list)
+ forboth(p, plan_list, pp, part_prune_results_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = lfirst_node(List, pp);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..7f8cf1494f 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -65,6 +65,38 @@ found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
subnode array will become out of sequence to the plan's subplan list.
+The so-called execution time pruning may also occur even before the execution
+has actually started. One case where that occurs is when a cached generic
+plan is being validated for execution by plancache.c:GetCachedPlan(), which
+works by locking all the relations that will be scanned by that plan. If the
+generic plan contains nodes that can perform execution time partition pruning
+(that is, contain a PartitionPruneInfo), a subset of pruning steps contained
+in a given node's PartitionPruneInfo that do not depend on the execution
+actually having started (called "initial" pruning steps) are performed as part
+of the plan validation step, by calling ExecutorDoInitialPruning(). That
+returns the minimal set of child subplans that satisfy thoe initial pruning
+steps contained in each PartitionPruneInfo. AcquireExecutorLocks() will then
+lock only the relations scanned by those subplans, in addition to those present
+inPlannedStmt.minLockRelids. Note that the subplans are not really pruned as
+in being removed from the plan tree, so care is needed by the downstreams
+users of such a plan that has undergone pre-execution initial pruning.
+
+To prevent the executor and any third party execution code that can look at
+the plan tree from trying to execute the subplans that were pruned as
+described above, the result of that pruning is passed to the executor as a
+List of PartitionPruneResult nodes via the QueryDesc, which is subsequently
+assigned to EState.es_part_prune_results. Each PartitionPruneResult therein
+consists of the set of indexes of surviving subplans in the respective parent
+plan node's (the one to which the corresponding PartitionPruneInfo belongs)
+list of child subplans, saved as a bitmapset valid_subplan_offs. The executor
+or any third party execution code working on a generic plan should not
+re-evaluate the set of initially valid subplans for a given plan node by
+redoing the initial pruning if a PartitionPruneResult belonging to thant plan
+node is present in es_part_prune_results. Note that that is not simply a
+performance optimization, because such re-evaluation of the pruning steps may
+very well end up resulting in a different set of initially valid subplans,
+containing some whose relations were not locked by AcquireExecutorLocks().
+
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
read-only to the executor, but the executor state for expression evaluation
@@ -286,6 +318,10 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [ ExecutorDoInitialPruning ] --- an optional step to perform initial
+ partition pruning on the plan tree the result of which is passed
+ to the executor via QueryDesc
+
CreateQueryDesc
ExecutorStart
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 12ff4f3de5..f15265716a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -49,6 +49,7 @@
#include "commands/matview.h"
#include "commands/trigger.h"
#include "executor/execdebug.h"
+#include "executor/execPartition.h"
#include "executor/nodeSubplan.h"
#include "foreign/fdwapi.h"
#include "jit/jit.h"
@@ -104,6 +105,56 @@ static void EvalPlanQualStart(EPQState *epqstate, Plan *planTree);
/* end of local decls */
+/* ----------------------------------------------------------------
+ * ExecutorDoInitialPruning
+ *
+ * For each plan tree node that has been assigned a PartitionPruneInfo,
+ * this performs initial partition pruning using the information contained
+ * therein to determine the set of child subplans that satisfy the initial
+ * pruning steps, to be returned as a bitmapset of their indexes in the
+ * node's list of child subplans (for example, an Append's appendplans).
+ *
+ * Return value is a List of PartitionPruneResult nodes, one for each
+ * PartitionPruneInfo found in plannedstmt->containsInitialPruning, each
+ * containing a bitmapset of the indexes of unpruned child subplans.
+ * A bitmapset of the RT indexes of the leaf partitions scanned by those
+ * subplans is returned in *scan_leafpart_rtis, which is shared across all
+ * of those PartitionPruneResults.
+ *
+ * The executor must see the exactly same set of subplans as valid for
+ * execution when doing ExecInitNode() on the plan nodes whose
+ * PartitionPruneInfos are processed here. So, it must get the set from the
+ * aforementioned PartitionPruneResult, instead of computing it all over
+ * again by redoing the initial pruning. It's the caller's job to pass the
+ * PartitionPruneResult to the executor.
+ *
+ * Note: Partitioned tables mentioned in PartitionedRelPruneInfo nodes that
+ * drive the pruning will be locked before doing the pruning.
+ * ----------------------------------------------------------------
+ */
+List *
+ExecutorDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *part_prune_results = NIL;
+ ListCell *lc;
+
+ /* Only get here if there is any pruning to do. */
+ Assert(plannedstmt->containsInitialPruning);
+
+ foreach(lc, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, params, pruneinfo,
+ scan_leafpart_rtis);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+
+ return part_prune_results;
+}
/* ----------------------------------------------------------------
* ExecutorStart
@@ -806,6 +857,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -826,6 +878,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aca0c6f323..917079a034 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -182,6 +183,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false;
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
@@ -597,12 +599,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -631,6 +636,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -657,6 +663,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -751,6 +762,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1232,8 +1249,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1244,12 +1263,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 88d0ea3adb..bc8331a222 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1749,8 +1755,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
- * require us to prune separately for each scan of the parent plan node.
+ * done once during executor startup or during ExecutorDoInitialPruning() that
+ * runs as part of performing AcquireExecutorLocks() on a given plan tree.
+ * Expressions that do involve such Params require us to prune separately for
+ * each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
* added benefit of not having to initialize the unneeded subplans at all.
@@ -1767,6 +1775,13 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the minimal set of child subplans
+ * to be executed of the parent plan node to which the PartitionPruneInfo
+ * belongs and also the set of the RT indexes of leaf partitions that will
+ * be scanned with those subplans.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1787,8 +1802,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * Initial pruning is performed here if needed (unless it has already been done
+ * by ExecutorDoInitialPruning()), and in that case only the surviving
+ * subplans' indexes are added.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1801,9 +1817,10 @@ ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo;
+ PartitionPruneResult *pruneresult = NULL;
/* Obtain the pruneinfo we need, and make sure it's the right one */
pruneinfo = list_nth(estate->es_part_prune_infos, part_prune_index);
@@ -1819,20 +1836,56 @@ ExecInitPartitionPruning(PlanState *planstate,
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /*
+ * No need to do initial pruning if it was done already by
+ * ExecutorDoInitialPruning(), which it would be if es_part_prune_results
+ * is set.
+ */
+ if (estate->es_part_prune_results)
+ pruneresult = list_nth_node(PartitionPruneResult,
+ estate->es_part_prune_results,
+ part_prune_index);
+
+ if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL,
+ pruneinfo->needs_exec_pruning,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1840,7 +1893,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1856,11 +1910,74 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the minimal set of child subplans that will be executed and also the
+ * set of RT indexes of the leaf partitions scanned by those subplans.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ MemoryContext oldcontext,
+ tmpcontext;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /*
+ * A temporary context for memory allocations required while executing
+ * partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "initial pruning working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+
+ /*
+ * PartitionDirectory to look up partition descriptors.
+ * Note that we don't omit detached partitions, just like during
+ * execution proper.
+ */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+
+ /*
+ * We don't yet have a PlanState for the parent plan node, so we must
+ * create a standalone ExprContext to evaluate pruning expressions,
+ * equipped with the information about the EXTERN parameters that the
+ * caller passed us. Note that that's okay because the initial pruning
+ * steps do not contain anything that requires the execution to have
+ * started and thus need the information contained in a PlanState.
+ */
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ MemoryContextSwitchTo(oldcontext);
+
+ /* Do the initial pruning. */
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+ MemoryContextDelete(tmpcontext);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1874,19 +1991,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1941,15 +2060,42 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called during
+ * ExecutorDoInitialPruning() on a cached plan. In that case,
+ * sub-partitions must be locked, because AcquirePlannerLocks()
+ * would not have seen them. (1st relation in a partrelpruneinfos
+ * list is always the root partitioned table appearing in the
+ * query, which AcquirePlannerLocks() would have locked; the
+ * Assert in relation_open() guards that assumption.)
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -1963,6 +2109,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1973,6 +2120,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2023,6 +2172,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2030,6 +2181,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
@@ -2051,7 +2203,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2061,7 +2213,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2289,10 +2441,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2327,7 +2483,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2341,6 +2497,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2351,13 +2509,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2384,8 +2544,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2393,7 +2559,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 572c87e453..044bf3f491 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -135,6 +135,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index dc13625171..bffb42ce71 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 99830198bd..3b917584de 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -156,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -578,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -643,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -718,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -869,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f370f9f287..ccfa083945 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -104,7 +104,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -219,7 +220,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..93012a5b3b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1578,6 +1578,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
CachedPlanSource *plansource;
CachedPlan *cplan;
List *stmt_list;
+ List *part_prune_results_list;
char *query_string;
Snapshot snapshot;
MemoryContext oldcontext;
@@ -1657,7 +1658,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1689,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2092,7 +2099,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL /* Not interested in PartitionPruneResults */);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2473,7 +2481,9 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
{
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
- ListCell *lc2;
+ List *part_prune_results_list;
+ ListCell *lc2,
+ *lc3;
spicallbackarg.query = plansource->query_string;
@@ -2549,8 +2559,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ plan_owner, _SPI_current->queryEnv,
+ &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
stmt_list = cplan->stmt_list;
/*
@@ -2589,9 +2601,10 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
- foreach(lc2, stmt_list)
+ forboth(lc2, stmt_list, lc3, part_prune_results_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = lfirst_node(List, lc3);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2663,7 +2676,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 23776367c5..b01f55fb4f 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -800,7 +805,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 799602f5ea..a96d316dca 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -520,7 +520,9 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 399c1812d4..44ffe71c49 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -270,6 +270,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -353,6 +363,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
ListCell *l;
+ Bitmapset *leafpart_rtis = NULL;
pruneinfo->root_parent_relids =
offset_relid_set(pruneinfo->root_parent_relids, rtoffset);
@@ -364,15 +375,50 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the global set of relations
+ * to be locked before executing the plan. AcquireExecutorLocks()
+ * will find the ones to add to the set after performing initial
+ * pruning.
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d48f6784c1..d5556354f7 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -342,6 +353,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -442,13 +455,18 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate that the
+ * returned PartitionedRelPruneInfos contains pruning steps that can be
+ * performed before and after execution begins, respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -459,6 +477,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -546,6 +568,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -620,6 +645,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -647,6 +678,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -659,6 +691,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -673,6 +706,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -697,6 +731,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 3082093d1e..95ab1d0eef 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ List *part_prune_results_list;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &part_prune_results_list);
+ Assert(list_length(cplan->stmt_list) ==
+ list_length(part_prune_results_list));
/*
* Now we can define the portal.
@@ -1987,6 +1990,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ /* Copy Lists of PartitionPruneResults into the portal's context. */
+ PortalStorePartitionPruneResults(portal, part_prune_results_list);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..f582ff177b 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results; /* ExecutorDoInitialPruning()
+ * output for plan */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +125,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: ExecutorDoInitialPruning() output for the PlannedStmt
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +138,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +150,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +496,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->part_prune_results_list == NIL ? NIL :
+ linitial(portal->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1235,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1283,19 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->part_prune_results_list != NIL)
+ part_prune_results = list_nth_node(List,
+ portal->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1304,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..8ff42153a1 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -99,14 +99,19 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv);
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt);
+static void ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -782,6 +787,26 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
return tlist;
}
+/*
+ * FreePartitionPruneResults
+ * Frees the List of Lists of PartitionPruneResults for CheckCachedPlan()
+ */
+static void
+FreePartitionPruneResults(List *part_prune_results_list)
+{
+ ListCell *lc;
+
+ foreach(lc, part_prune_results_list)
+ {
+ List *part_prune_results = lfirst_node(List, lc);
+
+ /* Free both the PartitionPruneResults and the containing List. */
+ list_free_deep(part_prune_results);
+ }
+
+ list_free(part_prune_results_list);
+}
+
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
@@ -790,15 +815,20 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
*
* On a "true" return, we have acquired the locks needed to run the plan.
* (We must do this for the "true" result to be race-condition-free.)
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
+ List **part_prune_results_list)
{
CachedPlan *plan = plansource->gplan;
/* Assert that caller checked the querytree */
Assert(plansource->is_valid);
+ *part_prune_results_list = NIL;
+
/* If there's no generic plan, just say "false" */
if (!plan)
return false;
@@ -820,13 +850,21 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ List *lockedRelids_per_stmt;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Lock relations scanned by the plan. This is where the pruning
+ * happens if needed.
+ */
+ AcquireExecutorLocks(plan->stmt_list, boundParams,
+ part_prune_results_list,
+ &lockedRelids_per_stmt);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +886,11 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ ReleaseExecutorLocks(plan->stmt_list, lockedRelids_per_stmt);
+
+ /* Release any PartitionPruneResults that may been created. */
+ FreePartitionPruneResults(*part_prune_results_list);
+ *part_prune_results_list = NIL;
}
/*
@@ -874,10 +916,14 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
+ *
+ * A list of NILs is returned in *part_prune_results_list, meaning that no
+ * no partition pruning has been done yet for the plans in stmt_list.
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
- ParamListInfo boundParams, QueryEnvironment *queryEnv)
+ ParamListInfo boundParams, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan;
List *plist;
@@ -1007,6 +1053,17 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
MemoryContextSwitchTo(oldcxt);
+ /*
+ * No actual PartitionPruneResults yet to add, though must initialize
+ * the list to have the same number of elements as the list of
+ * PlannedStmts.
+ */
+ *part_prune_results_list = NIL;
+ foreach(lc, plist)
+ {
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
+ }
+
return plan;
}
@@ -1126,6 +1183,19 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
+ * For every PlannedStmt found in the returned CachedPlan, an element that
+ * is either a List of PartitionPruneResult or a NIL is added to
+ * *part_prune_results_list. The former if the PlannedStmt is from
+ * the existing CachedPlan that is otherwise valid and has
+ * containsInitialPruning set to true. Before returning such a CachedPlan,
+ * those "initial" steps are performed by calling ExecutorDoInitialPruning()
+ * to determine only those leaf partitions that need to be locked by
+ * AcquireExecutorLocks() by pruning away subplans that don't match the
+ * "initial" pruning conditions. For each PartitionPruneInfo found in
+ * PlannedStmt.partPruneInfos, a PartitionPruneResult containing the bitmapset
+ * of the indexes of surviving subplans is added to the List for the
+ * PlannedStmt.
+ *
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
@@ -1139,11 +1209,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ List **part_prune_results_list)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ List *my_part_prune_results_list;
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
@@ -1160,7 +1232,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, boundParams,
+ &my_part_prune_results_list))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1169,7 +1242,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
else
{
/* Build a new generic plan */
- plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, NULL, queryEnv,
+ &my_part_prune_results_list);
/* Just make real sure plansource->gplan is clear */
ReleaseGenericPlan(plansource);
/* Link the new generic plan into the plansource */
@@ -1214,7 +1288,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (customplan)
{
/* Build a custom plan */
- plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv);
+ plan = BuildCachedPlan(plansource, qlist, boundParams, queryEnv,
+ &my_part_prune_results_list);
/* Accumulate total costs of custom plans */
plansource->total_custom_cost += cached_plan_cost(plan, true);
@@ -1246,6 +1321,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
plan->is_saved = true;
}
+ if (part_prune_results_list)
+ *part_prune_results_list = my_part_prune_results_list;
+
return plan;
}
@@ -1737,17 +1815,29 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ *
+ * See GetCachedPlan()'s comment for a description of part_prune_results_list.
+ *
+ * On return, *lockedRelids_per_stmt will contain a bitmapset for every
+ * PlannedStmt in stmt_list, containing the RT indexes of relation entries
+ * in its range table that were actually locked, or NULL if the PlannedStmt
+ * contains a utility statement.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocks(List *stmt_list, ParamListInfo boundParams,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
{
ListCell *lc1;
+ *part_prune_results_list = *lockedRelids_per_stmt = NIL;
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ List *part_prune_results = NIL;
+ Bitmapset *allLockRelids;
+ Bitmapset *lockedRelids = NULL;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1761,13 +1851,40 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
+ *part_prune_results_list = lappend(*part_prune_results_list, NIL);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ /*
+ * Figure out the set of relations that would need to be locked
+ * before executing the plan.
+ */
+ if (plannedstmt->containsInitialPruning)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ Bitmapset *scan_leafpart_rtis = NULL;
+
+ /*
+ * Obtain the set of leaf partitions to be locked.
+ *
+ * The following does initial partition pruning using the
+ * PartitionPruneInfos found in plannedstmt->partPruneInfos and
+ * finds leaf partitions that survive that pruning across all the
+ * nodes in the plan tree.
+ */
+ part_prune_results = ExecutorDoInitialPruning(plannedstmt,
+ boundParams,
+ &scan_leafpart_rtis);
+ allLockRelids = bms_union(plannedstmt->minLockRelids,
+ scan_leafpart_rtis);
+ }
+ else
+ allLockRelids = plannedstmt->minLockRelids;
+
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
@@ -1778,10 +1895,59 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
* fail if it's been dropped entirely --- we'll just transiently
* acquire a non-conflicting lock.
*/
- if (acquire)
- LockRelationOid(rte->relid, rte->rellockmode);
- else
- UnlockRelationOid(rte->relid, rte->rellockmode);
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ *part_prune_results_list = lappend(*part_prune_results_list,
+ part_prune_results);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, lockedRelids);
+ }
+}
+
+/*
+ * ReleaseExecutorLocks
+ * Release locks that would've been acquired by an earlier call to
+ * AcquireExecutorLocks()
+ */
+static void
+ReleaseExecutorLocks(List *stmt_list, List *lockedRelids_per_stmt)
+{
+ ListCell *lc1,
+ *lc2;
+
+ forboth(lc1, stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockedRelids = lfirst_node(Bitmapset, lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, except those (such as EXPLAIN) that
+ * contain a parsed-but-not-planned query. Note: it's okay to use
+ * ScanQueryForLocks, even though the query hasn't been through
+ * rule rewriting, because rewriting doesn't change the query
+ * representation.
+ */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ Assert(lockedRelids == NULL);
+ if (query)
+ ScanQueryForLocks(query, false);
+ continue;
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) >= 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /* See the comment in AcquireExecutorLocks(). */
+ UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
}
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..5b9098971b 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,25 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * PortalStorePartitionPruneResults
+ * Copy the given List of Lists of PartitionPruneResults into the
+ * portal's context
+ *
+ * This allows the caller to ensure that the list exists as long as the portal
+ * does.
+ */
+void
+PortalStorePartitionPruneResults(Portal portal, List *part_prune_results_list)
+{
+ MemoryContext oldcxt;
+
+ Assert(PortalIsValid(portal));
+ oldcxt = MemoryContextSwitchTo(portal->portalContext);
+ portal->part_prune_results_list = copyObject(part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 17fabc18c9..4b98d0d2ef 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
@@ -127,5 +129,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..7d4379da7b 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* ExecutorDoInitialPruning()'s
+ * output for plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index aaf2bc78b9..32bbbc5927 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -185,6 +185,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
+extern List *ExecutorDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ Bitmapset **scan_leafpart_rtis);
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 71248a9466..9c6e8f5e13 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -619,6 +619,7 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 1f33902947..c2f2544df5 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -218,6 +218,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index dbaa9bb54d..e0e5c15b09 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -125,6 +125,18 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries minus indexes of range table entries
+ * of the leaf partitions scanned by prunable subplans; see
+ * AcquireExecutorLocks()
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c36a15bd09..714e2cf2c7 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,8 +73,17 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in the
* plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries minus
+ * indexes of range table entries of the leaf
+ * partitions scanned by prunable subplans;
+ * see AcquireExecutorLocks() */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1414,6 +1423,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1425,6 +1441,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
Bitmapset *root_parent_relids;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1469,6 +1487,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
@@ -1553,6 +1574,31 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started. A module that needs to do so
+ * should call ExecutorDoInitialPruning() on a given PlannedStmt, which
+ * returns a List of PartitionPruneResult containing an entry for each
+ * PartitionPruneInfo present in PlannedStmt.part_prune_infos. The module
+ * should then pass that list, along with the PlannedStmt, to the executor,
+ * so that it can reuse the result of initial partition pruning when
+ * initializing the subplans for execution.
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..32579d4788 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -220,7 +220,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ List **part_prune_results_list);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..1901fc5f28 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,7 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ List *part_prune_results_list; /* List of Lists of PartitionPruneResults */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +243,8 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalStorePartitionPruneResults(Portal portal,
+ List *part_prune_results_list);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-06 19:00 Alvaro Herrera <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Alvaro Herrera @ 2022-12-06 19:00 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
I find the API of GetCachedPlans a little weird after this patch. I
think it may be better to have it return a pointer of a new struct --
one that contains both the CachedPlan pointer and the list of pruning
results. (As I understand, the sole caller that isn't interested in the
pruning results, SPI_plan_get_cached_plan, can be explained by the fact
that it knows there won't be any. So I don't think we need to worry
about this case?)
And I think you should make that struct also be the last argument of
PortalDefineQuery, so you don't need the separate
PortalStorePartitionPruneResults function -- because as far as I can
tell, the callers that pass a non-NULL pointer there are the exactly
same that later call PortalStorePartitionPruneResults.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños." (Jakob Nielsen)
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-09 08:26 Amit Langote <[email protected]>
parent: Alvaro Herrera <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-12-09 08:26 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
Thanks for the review.
On Wed, Dec 7, 2022 at 4:00 AM Alvaro Herrera <[email protected]> wrote:
> I find the API of GetCachedPlans a little weird after this patch. I
> think it may be better to have it return a pointer of a new struct --
> one that contains both the CachedPlan pointer and the list of pruning
> results. (As I understand, the sole caller that isn't interested in the
> pruning results, SPI_plan_get_cached_plan, can be explained by the fact
> that it knows there won't be any. So I don't think we need to worry
> about this case?)
David, in his Apr 7 reply on this thread, also sounded to suggest
something similar.
Hmm, I was / am not so sure if GetCachedPlan() should return something
that is not CachedPlan. An idea I had today was to replace the
part_prune_results_list output List parameter with, say,
QueryInitPruningResult, or something like that and put the current
list into that struct. Was looking at QueryEnvironment to come up
with *that* name. Any thoughts?
> And I think you should make that struct also be the last argument of
> PortalDefineQuery, so you don't need the separate
> PortalStorePartitionPruneResults function -- because as far as I can
> tell, the callers that pass a non-NULL pointer there are the exactly
> same that later call PortalStorePartitionPruneResults.
Yes, it would be better to not need PortalStorePartitionPruneResults.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-09 09:52 Alvaro Herrera <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Alvaro Herrera @ 2022-12-09 09:52 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On 2022-Dec-09, Amit Langote wrote:
> On Wed, Dec 7, 2022 at 4:00 AM Alvaro Herrera <[email protected]> wrote:
> > I find the API of GetCachedPlans a little weird after this patch.
> David, in his Apr 7 reply on this thread, also sounded to suggest
> something similar.
>
> Hmm, I was / am not so sure if GetCachedPlan() should return something
> that is not CachedPlan. An idea I had today was to replace the
> part_prune_results_list output List parameter with, say,
> QueryInitPruningResult, or something like that and put the current
> list into that struct. Was looking at QueryEnvironment to come up
> with *that* name. Any thoughts?
Remind me again why is part_prune_results_list not part of struct
CachedPlan then? I tried to understand that based on comments upthread,
but I was unable to find anything.
(My first reaction to your above comment was "well, rename GetCachedPlan
then, maybe to GetRunnablePlan", but then I'm wondering if CachedPlan is
in any way a structure that must be "immutable" in the way parser output
is. Looking at the comment at the top of plancache.c it appears to me
that it isn't, but maybe I'm missing something.)
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"The Postgresql hackers have what I call a "NASA space shot" mentality.
Quite refreshing in a world of "weekend drag racer" developers."
(Scott Marlowe)
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-09 10:34 Amit Langote <[email protected]>
parent: Alvaro Herrera <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-12-09 10:34 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Fri, Dec 9, 2022 at 6:52 PM Alvaro Herrera <[email protected]> wrote:
> On 2022-Dec-09, Amit Langote wrote:
> > On Wed, Dec 7, 2022 at 4:00 AM Alvaro Herrera <[email protected]> wrote:
> > > I find the API of GetCachedPlans a little weird after this patch.
>
> > David, in his Apr 7 reply on this thread, also sounded to suggest
> > something similar.
> >
> > Hmm, I was / am not so sure if GetCachedPlan() should return something
> > that is not CachedPlan. An idea I had today was to replace the
> > part_prune_results_list output List parameter with, say,
> > QueryInitPruningResult, or something like that and put the current
> > list into that struct. Was looking at QueryEnvironment to come up
> > with *that* name. Any thoughts?
>
> Remind me again why is part_prune_results_list not part of struct
> CachedPlan then? I tried to understand that based on comments upthread,
> but I was unable to find anything.
It used to be part of CachedPlan for a brief period of time (in patch
v12 I posted in [1]), but David, in his reply to [1], said he wasn't
so sure that it belonged there.
> (My first reaction to your above comment was "well, rename GetCachedPlan
> then, maybe to GetRunnablePlan", but then I'm wondering if CachedPlan is
> in any way a structure that must be "immutable" in the way parser output
> is. Looking at the comment at the top of plancache.c it appears to me
> that it isn't, but maybe I'm missing something.)
CachedPlan *is* supposed to be read-only per the comment above
CachedPlanSource definition:
* ...If we are using a generic
* cached plan then it is meant to be re-used across multiple executions, so
* callers must always treat CachedPlans as read-only.
FYI, there was even an idea of putting a PartitionPruneResults for a
given PlannedStmt into the PlannedStmt itself [2], but PlannedStmt is
supposed to be read-only too [3].
Maybe we need some new overarching context when invoking plancache, if
Portal can't already be it, whose struct can be passed to
GetCachedPlan() to put the pruning results in? Perhaps,
GetRunnablePlan() that you floated could be a wrapper for
GetCachedPlan(), owning that new context.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
[1] https://www.postgresql.org/message-id/CA%2BHiwqH4qQ_YVROr7TY0jSCuGn0oHhH79_DswOdXWN5UnMCBtQ%40mail.g...
[2] https://www.postgresql.org/message-id/CAApHDvp_DjVVkgSV24%2BUF7p_yKWeepgoo%2BW2SWLLhNmjwHTVYQ%40mail...
[3] https://www.postgresql.org/message-id/922566.1648784745%40sss.pgh.pa.us
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-09 10:49 Alvaro Herrera <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Alvaro Herrera @ 2022-12-09 10:49 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On 2022-Dec-09, Amit Langote wrote:
> On Fri, Dec 9, 2022 at 6:52 PM Alvaro Herrera <[email protected]> wrote:
> > Remind me again why is part_prune_results_list not part of struct
> > CachedPlan then? I tried to understand that based on comments upthread,
> > but I was unable to find anything.
>
> It used to be part of CachedPlan for a brief period of time (in patch
> v12 I posted in [1]), but David, in his reply to [1], said he wasn't
> so sure that it belonged there.
I'm not sure I necessarily agree with that. I'll have a look at v12 to
try and understand what was David so unhappy about.
> > (My first reaction to your above comment was "well, rename GetCachedPlan
> > then, maybe to GetRunnablePlan", but then I'm wondering if CachedPlan is
> > in any way a structure that must be "immutable" in the way parser output
> > is. Looking at the comment at the top of plancache.c it appears to me
> > that it isn't, but maybe I'm missing something.)
>
> CachedPlan *is* supposed to be read-only per the comment above
> CachedPlanSource definition:
>
> * ...If we are using a generic
> * cached plan then it is meant to be re-used across multiple executions, so
> * callers must always treat CachedPlans as read-only.
I read that as implying that the part_prune_results_list must remain
intact as long as no invalidations occur. Does part_prune_result_list
really change as a result of something other than a sinval event?
Keep in mind that if a sinval message that touches one of the relations
in the plan arrives, then we'll discard it and generate it afresh. I
don't see that the part_prune_results_list would change otherwise, but
maybe I misunderstand?
> FYI, there was even an idea of putting a PartitionPruneResults for a
> given PlannedStmt into the PlannedStmt itself [2], but PlannedStmt is
> supposed to be read-only too [3].
Hmm, I'm not familiar with PlannedStmt lifetime, but I'm definitely not
betting that Tom is wrong about this.
> Maybe we need some new overarching context when invoking plancache, if
> Portal can't already be it, whose struct can be passed to
> GetCachedPlan() to put the pruning results in? Perhaps,
> GetRunnablePlan() that you floated could be a wrapper for
> GetCachedPlan(), owning that new context.
Perhaps that is a solution. I'm not sure.
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-09 11:02 Amit Langote <[email protected]>
parent: Alvaro Herrera <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-12-09 11:02 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Fri, Dec 9, 2022 at 7:49 PM Alvaro Herrera <[email protected]> wrote:
> On 2022-Dec-09, Amit Langote wrote:
> > On Fri, Dec 9, 2022 at 6:52 PM Alvaro Herrera <[email protected]> wrote:
> > > Remind me again why is part_prune_results_list not part of struct
> > > CachedPlan then? I tried to understand that based on comments upthread,
> > > but I was unable to find anything.
> >
> > > (My first reaction to your above comment was "well, rename GetCachedPlan
> > > then, maybe to GetRunnablePlan", but then I'm wondering if CachedPlan is
> > > in any way a structure that must be "immutable" in the way parser output
> > > is. Looking at the comment at the top of plancache.c it appears to me
> > > that it isn't, but maybe I'm missing something.)
> >
> > CachedPlan *is* supposed to be read-only per the comment above
> > CachedPlanSource definition:
> >
> > * ...If we are using a generic
> > * cached plan then it is meant to be re-used across multiple executions, so
> > * callers must always treat CachedPlans as read-only.
>
> I read that as implying that the part_prune_results_list must remain
> intact as long as no invalidations occur. Does part_prune_result_list
> really change as a result of something other than a sinval event?
> Keep in mind that if a sinval message that touches one of the relations
> in the plan arrives, then we'll discard it and generate it afresh. I
> don't see that the part_prune_results_list would change otherwise, but
> maybe I misunderstand?
Pruning will be done afresh on every fetch of a given cached plan when
CheckCachedPlan() is called on it, so the part_prune_results_list part
will be discarded and rebuilt as many times as the plan is executed.
You'll find a description around CachedPlanSavePartitionPruneResults()
that's in v12.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-09 11:37 Alvaro Herrera <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Alvaro Herrera @ 2022-12-09 11:37 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On 2022-Dec-09, Amit Langote wrote:
> Pruning will be done afresh on every fetch of a given cached plan when
> CheckCachedPlan() is called on it, so the part_prune_results_list part
> will be discarded and rebuilt as many times as the plan is executed.
> You'll find a description around CachedPlanSavePartitionPruneResults()
> that's in v12.
I see.
In that case, a separate container struct seems warranted.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Industry suffers from the managerial dogma that for the sake of stability
and continuity, the company should be independent of the competence of
individual employees." (E. Dijkstra)
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-12 11:19 Amit Langote <[email protected]>
parent: Alvaro Herrera <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-12-12 11:19 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Fri, Dec 9, 2022 at 8:37 PM Alvaro Herrera <[email protected]> wrote:
> On 2022-Dec-09, Amit Langote wrote:
>
> > Pruning will be done afresh on every fetch of a given cached plan when
> > CheckCachedPlan() is called on it, so the part_prune_results_list part
> > will be discarded and rebuilt as many times as the plan is executed.
> > You'll find a description around CachedPlanSavePartitionPruneResults()
> > that's in v12.
>
> I see.
>
> In that case, a separate container struct seems warranted.
I thought about this today and played around with some container struct ideas.
Though, I started feeling like putting all the new logic being added
by this patch into plancache.c at the heart of GetCachedPlan() and
tweaking its API in kind of unintuitive ways may not have been such a
good idea to begin with. So I started thinking again about your
GetRunnablePlan() wrapper idea and thought maybe we could do something
with it. Let's say we name it GetCachedPlanLockPartitions() and put
the logic that does initial pruning with the new
ExecutorDoInitialPruning() in it, instead of in the normal
GetCachedPlan() path. Any callers that call GetCachedPlan() instead
call GetCachedPlanLockPartitions() with either the List ** parameter
as now or some container struct if that seems better. Whether
GetCachedPlanLockPartitions() needs to do anything other than return
the CachedPlan returned by GetCachedPlan() can be decided by the
latter setting, say, CachedPlan.has_unlocked_partitions. That will be
done by AcquireExecutorLocks() when it sees containsInitialPrunnig in
any of the PlannedStmts it sees, locking only the
PlannedStmt.minLockRelids set (which is all relations where no pruning
is needed!), leaving the partition locking to
GetCachedPlanLockPartitions(). If the CachedPlan is invalidated
during the partition locking phase, it calls GetCachedPlan() again;
maybe some refactoring is needed to avoid too much useless work in
such cases.
Thoughts?
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-12 17:24 Alvaro Herrera <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Alvaro Herrera @ 2022-12-12 17:24 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On 2022-Dec-12, Amit Langote wrote:
> I started feeling like putting all the new logic being added
> by this patch into plancache.c at the heart of GetCachedPlan() and
> tweaking its API in kind of unintuitive ways may not have been such a
> good idea to begin with. So I started thinking again about your
> GetRunnablePlan() wrapper idea and thought maybe we could do something
> with it. Let's say we name it GetCachedPlanLockPartitions() and put
> the logic that does initial pruning with the new
> ExecutorDoInitialPruning() in it, instead of in the normal
> GetCachedPlan() path. Any callers that call GetCachedPlan() instead
> call GetCachedPlanLockPartitions() with either the List ** parameter
> as now or some container struct if that seems better. Whether
> GetCachedPlanLockPartitions() needs to do anything other than return
> the CachedPlan returned by GetCachedPlan() can be decided by the
> latter setting, say, CachedPlan.has_unlocked_partitions. That will be
> done by AcquireExecutorLocks() when it sees containsInitialPrunnig in
> any of the PlannedStmts it sees, locking only the
> PlannedStmt.minLockRelids set (which is all relations where no pruning
> is needed!), leaving the partition locking to
> GetCachedPlanLockPartitions().
Hmm. This doesn't sound totally unreasonable, except to the point David
was making that perhaps we may want this container struct to accomodate
other things in the future than just the partition pruning results, so I
think its name (and that of the function that produces it) ought to be a
little more generic than that.
(I think this also answers your question on whether a List ** is better
than a container struct.)
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-14 08:35 Amit Langote <[email protected]>
parent: Alvaro Herrera <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-12-14 08:35 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Tue, Dec 13, 2022 at 2:24 AM Alvaro Herrera <[email protected]> wrote:
> On 2022-Dec-12, Amit Langote wrote:
> > I started feeling like putting all the new logic being added
> > by this patch into plancache.c at the heart of GetCachedPlan() and
> > tweaking its API in kind of unintuitive ways may not have been such a
> > good idea to begin with. So I started thinking again about your
> > GetRunnablePlan() wrapper idea and thought maybe we could do something
> > with it. Let's say we name it GetCachedPlanLockPartitions() and put
> > the logic that does initial pruning with the new
> > ExecutorDoInitialPruning() in it, instead of in the normal
> > GetCachedPlan() path. Any callers that call GetCachedPlan() instead
> > call GetCachedPlanLockPartitions() with either the List ** parameter
> > as now or some container struct if that seems better. Whether
> > GetCachedPlanLockPartitions() needs to do anything other than return
> > the CachedPlan returned by GetCachedPlan() can be decided by the
> > latter setting, say, CachedPlan.has_unlocked_partitions. That will be
> > done by AcquireExecutorLocks() when it sees containsInitialPrunnig in
> > any of the PlannedStmts it sees, locking only the
> > PlannedStmt.minLockRelids set (which is all relations where no pruning
> > is needed!), leaving the partition locking to
> > GetCachedPlanLockPartitions().
>
> Hmm. This doesn't sound totally unreasonable, except to the point David
> was making that perhaps we may want this container struct to accomodate
> other things in the future than just the partition pruning results, so I
> think its name (and that of the function that produces it) ought to be a
> little more generic than that.
>
> (I think this also answers your question on whether a List ** is better
> than a container struct.)
OK, so here's a WIP attempt at that.
I have moved the original functionality of GetCachedPlan() to
GetCachedPlanInternal(), turning the former into a sort of controller
as described shortly. The latter's CheckCachedPlan() part now only
locks the "minimal" set of, non-prunable, relations, making a note of
whether the plan contains any prunable subnodes and thus prunable
relations whose locking is deferred to the caller, GetCachedPlan().
GetCachedPlan(), as a sort of controller as mentioned before, does the
pruning if needed on the minimally valid plan returned by
GetCachedPlanInternal(), locks the partitions that survive, and redoes
the whole thing if the locking of partitions invalidates the plan.
The pruning results are returned through the new output parameter of
GetCachedPlan() of type CachedPlanExtra. I named it so after much
consideration, because all the new logic that produces stuff to put
into it is a part of the plancache module and has to do with
manipulating a CachedPlan. (I had considered CachedPlanExecInfo to
indicate that it contains information that is to be forwarded to the
executor, though that just didn't seem to fit in plancache.h.)
I have broken out a few things into a preparatory patch 0001. Mainly,
it invents PlannedStmt.minLockRelids to replace the
AcquireExecutorLocks()'s current loop over the range table to figure
out the relations to lock. I also threw in a couple of pruning
related non-functional changes in there to make it easier to read the
0002, which is the main patch.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v29-0001-Preparatory-refactoring-before-reworking-CachedP.patch (17.2K, 2-v29-0001-Preparatory-refactoring-before-reworking-CachedP.patch)
download | inline diff:
From 14a1198bdaad007b1dc835f24caa42d3667c7048 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Tue, 13 Dec 2022 11:58:07 +0900
Subject: [PATCH v29 1/2] Preparatory refactoring before reworking CachedPlan
locking
Remember the RT indexes of RTEs that AcquireExecutorLocks() must
look at to consider locking in a bitmapset, so that nstead of looping
over the range table to find those RTEs, it can look them up using
the RT indexes set in the bitmapset.
This also adds some extra information related to execution-time
pruning to the relevant plan nodes.
---
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 6 ++++
src/backend/nodes/readfuncs.c | 8 ++++--
src/backend/optimizer/plan/planner.c | 2 ++
src/backend/optimizer/plan/setrefs.c | 12 ++++++++
src/backend/partitioning/partprune.c | 42 ++++++++++++++++++++++++++--
src/backend/utils/cache/plancache.c | 10 +++++--
src/include/executor/execPartition.h | 2 ++
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 11 ++++++++
src/include/nodes/plannodes.h | 19 +++++++++++++
11 files changed, 106 insertions(+), 8 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a5b8e43ec5..65c4b63bbd 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -182,6 +182,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false; /* workers need not know! */
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 76d79b9741..5b62157712 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1956,6 +1956,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1966,6 +1967,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2016,6 +2019,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2023,6 +2028,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 966b75f5a6..1161671fa4 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -796,7 +801,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5dd4f92720..620b163ef9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -523,8 +523,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 596f1fbc8e..ed43d5936d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -279,6 +279,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -377,9 +387,11 @@ set_plan_references(PlannerInfo *root, Plan *plan)
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
}
+
}
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ glob->containsInitialPruning |= pruneinfo->needs_init_pruning;
}
return result;
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d48f6784c1..56270d7670 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -342,6 +353,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -442,13 +455,19 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate whether
+ * the pruning steps contained in the returned PartitionedRelPruneInfos
+ * can be performed during executor startup and during execution,
+ * respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -459,6 +478,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -546,6 +569,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -620,6 +646,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -647,6 +679,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -659,6 +692,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -673,6 +707,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -697,6 +732,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..339bb603f7 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1747,7 +1747,8 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ Bitmapset *allLockRelids;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1760,14 +1761,17 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
*/
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+ Assert(plannedstmt->minLockRelids == NULL);
if (query)
ScanQueryForLocks(query, acquire);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ allLockRelids = plannedstmt->minLockRelids;
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 17fabc18c9..aeeaeb7884 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 1f33902947..c2f2544df5 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -218,6 +218,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 654dba61aa..4337e7aa34 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -128,6 +128,17 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries; for AcquireExecutorLocks()'s
+ * perusal.
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index bddfe86191..eb0a007946 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,11 +73,18 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in the
* plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries; for
+ * AcquireExecutorLocks()'s perusal */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1417,6 +1424,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1428,6 +1442,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
Bitmapset *root_parent_relids;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1472,6 +1488,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
--
2.35.3
[application/octet-stream] v29-0002-In-GetCachedPlan-only-lock-unpruned-partitions.patch (67.1K, 3-v29-0002-In-GetCachedPlan-only-lock-unpruned-partitions.patch)
download | inline diff:
From 69855fffacf69575471beb69da761babadc9f75c Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v29 2/2] In GetCachedPlan(), only lock unpruned partitions
This does two things mainly:
* The planner now removes the RT indexes of "initially prunable"
partitions from PlannedStmt.minLockRelids such that the set only
contains the relations not subject to initial partition pruning. So,
AcquireExecutorLocks only locks a subset of the relations contained
in a plan, deferring the locking of prunable relations to the caller.
* GetCachedPlans(), if there are prunable relations in the plan,
performs the initial partition pruning using available EXTERN params
and locks the partitions remaining after that, so the the CachedPlan
that's returned is valid in a race-free manner including for any
partitions that will be scanned during execution.
To make the pruning possible before entering ExecutorStart(), this
also adds a ExecPartitionDoInitialPruning(), which can be called by
GetCachedPlan() for a given PlannedStmt.
The result of performing initial partition pruning this way is made
available to the actual execution via PartitionPruneResult, of which
there is one for every ParttionPruneInfo contained in the PlannedStmt.
List of PartitionPruneResult for a given PlannedStmt are returned to
to the callers of GetCachedPlan() via its new output parameter of type
CachedPlanExtra, whose members currently only include said List.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 28 ++-
src/backend/executor/README | 31 ++-
src/backend/executor/execMain.c | 2 +
src/backend/executor/execParallel.c | 25 ++-
src/backend/executor/execPartition.c | 215 +++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 31 ++-
src/backend/optimizer/plan/setrefs.c | 36 ++++
src/backend/tcop/postgres.c | 9 +-
src/backend/tcop/pquery.c | 28 ++-
src/backend/utils/cache/plancache.c | 257 +++++++++++++++++++++++--
src/backend/utils/mmgr/portalmem.c | 16 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 7 +-
src/include/executor/execdesc.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 4 +-
src/include/nodes/plannodes.h | 31 ++-
src/include/utils/plancache.h | 11 +-
src/include/utils/portal.h | 3 +
28 files changed, 694 insertions(+), 82 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index cf1b1ca571..904cbcba4a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -779,7 +779,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 9ac0383459..65c8d0aa59 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -408,7 +408,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..729384a9a6 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,11 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +212,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ if (cplan_extra)
+ PortalSaveCachedPlanExtra(portal, cplan_extra);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -575,6 +583,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -619,7 +628,11 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -637,10 +650,17 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = NIL;
+
+ if (cplan_extra)
+ part_prune_results = list_nth_node(List,
+ cplan_extra->part_prune_results_list,
+ foreach_current_index(p));
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..2222b3ed6f 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -63,7 +63,36 @@ if the executor determines that an entire subplan is not required due to
execution time partition pruning determining that no matching records will be
found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
-subnode array will become out of sequence to the plan's subplan list.
+subnode array will become out of sequence to the plan's subplan list. Note
+that this is referred to as "initial" pruning, because it needs to occur only
+once during the execution startup, and uses a set of pruning steps called
+initial pruning steps (see PartitionedRelPruneInfo.initial_pruning_steps).
+
+Actually, "initial" pruning may occur even before the execution startup in
+in some cases. For example, when a cached generic plan is validated for
+execution, which works by locking all the relations that will be scanned by
+that plan during execution. If the generic plan contains plan nodes that have
+prunable child subnodes, then this validation locking is performed after
+pruning child subnodes that need not be scanned during execution, that is,
+using initial pruning steps. When such a generic plan is forwarded for
+execution, it must be accompanied by the set of PartitionPruneResult nodes that
+contain the result of that pruning, which basically consists of a bitmapset of
+child subnode indexes that survived the pruning and thus whose relations would
+have been locked for execution. This is important, because, unlike the
+plan-time pruning and actual executor-startup pruning, this does not actually
+remove the pruned subnodes from the plan tree, but only marks them as being
+pruned. So, the executor code (core or third party), especially one that runs
+before ExecutorStart() and thus looks at bare Plan trees (not PlanState trees)
+must beware of plan nodes that may actually have been pruned and thus subject
+to being invalidated by concurrent schema changes. For plan nodes that can
+have prunable child subnodes and thus contain a PartitionPruneInfo, such code
+must always check if the corresponding PartitionPruneResult exists
+in EState.es_part_prune_results at given part_prune_index and use that to
+decide which subplans are valid for execution instead of redoing the pruning.
+Note that that is not just a performance optimization but also necessary to
+avoid possibly ending up considering a different set of child subnodes as valid
+than the set CachedPlanLockPartitions() would have locked the relations of, if
+the pruning steps produce a different result when executed multiple times.
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2c2b3a8874..229f61f72e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -798,6 +798,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -819,6 +820,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 65c4b63bbd..9745eba0af 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -599,12 +600,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -633,6 +637,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -659,6 +664,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -753,6 +763,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1234,8 +1250,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1246,12 +1264,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 5b62157712..dcd2bb0f90 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1742,7 +1748,8 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
+ * done once during executor startup or even before that, such as when called
+ * from CachedPlanLockPartitions(). Expressions that do involve such Params
* require us to prune separately for each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
@@ -1760,6 +1767,12 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the set of the parent plan node's
+ * child subnodes that are valid for execution and also the set of the RT
+ * indexes of leaf partitions scanned by those subnodes.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1780,8 +1793,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * That set is computed by either performing the "initial pruning" here or
+ * reusing the one present in EState.es_part_prune_results[part_prune_index]
+ * if it has been set, which it would be if CachedPlanLockPartitions() would
+ * have done the initial pruning.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1794,9 +1809,10 @@ ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo;
+ PartitionPruneResult *pruneresult = NULL;
/* Obtain the pruneinfo we need, and make sure it's the right one */
pruneinfo = list_nth(estate->es_part_prune_infos, part_prune_index);
@@ -1812,20 +1828,62 @@ ExecInitPartitionPruning(PlanState *planstate,
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* Initial pruning already done if es_part_prune_results has been set. */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth_node(PartitionPruneResult,
+ estate->es_part_prune_results,
+ part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo and PartitionPruneResult at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("prunresult relids %s, pruneinfo relids %s",
+ bmsToString(pruneresult->root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
+ }
+
+ if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL,
+ pruneinfo->needs_exec_pruning,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1833,7 +1891,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1849,11 +1908,58 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the set of the parent plan node's child subnodes that are valid for
+ * execution
+ *
+ * On return, *scan_leafpart_rtis will contain the RT indexes of leaf
+ * partitions scanned by those valid subnodes.
+ *
+ * Note that this does not share state with the actual execution, so must do
+ * with the information present in the PlannedStmt. For example, there isn't
+ * a PlanState for the parent plan node yet, so we must create a standalone
+ * ExprContext to evaluate pruning expressions, equipped with the information
+ * about the EXTERN parameters that we do have. Note that that's okay because
+ * the initial pruning steps do not contain anything that would require the
+ * execution to have started. Likewise, we create our own PartitionDirectory
+ * to look up the PartitionDescs to use.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /* Don't omit detached partitions, just like during execution proper. */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1867,19 +1973,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1934,15 +2042,39 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called from
+ * CachedPlanLockPartitions(). In that case, sub-partitions must
+ * be locked, because AcquirePlannerLocks() would have locked only
+ * the root parent.
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -2050,7 +2182,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2060,7 +2192,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2288,10 +2420,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2326,7 +2462,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2340,6 +2476,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2350,13 +2488,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2383,8 +2523,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2392,7 +2538,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 87f4d53ca7..7d36c972d3 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -139,6 +139,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index dc13625171..bffb42ce71 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 99830198bd..3b917584de 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -156,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -578,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -643,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -718,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -869,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f370f9f287..ccfa083945 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -104,7 +104,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -219,7 +220,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..2ecb9193aa 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1577,6 +1577,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra;
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1657,7 +1658,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1690,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ if (cplan_extra)
+ PortalSaveCachedPlanExtra(portal, cplan_extra);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2067,6 +2075,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2092,8 +2101,12 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cplan_extra);
Assert(cplan == plansource->gplan);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
@@ -2399,6 +2412,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
CachedPlan *cplan = NULL;
+ CachedPlanExtra *cplan_extra = NULL;
ListCell *lc1;
/*
@@ -2549,8 +2563,12 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
stmt_list = cplan->stmt_list;
/*
@@ -2592,9 +2610,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = NIL;
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
+ if (cplan_extra)
+ part_prune_results = list_nth_node(List,
+ cplan_extra->part_prune_results_list,
+ foreach_current_index(lc2));
/*
* Reset output state. (Note that if a non-SPI receiver is used,
* _SPI_current->processed will stay zero, and that's what we'll
@@ -2663,7 +2686,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ed43d5936d..db27cae297 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -372,6 +372,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
ListCell *l;
+ Bitmapset *leafpart_rtis = NULL;
pruneinfo->root_parent_relids =
offset_relid_set(pruneinfo->root_parent_relids, rtoffset);
@@ -383,17 +384,52 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the set of relations to be
+ * locked by AcquireExecutorLocks(). The actual set of leaf
+ * partitions to be locked is computed by
+ * CachedPlanLockPartitions().
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
glob->containsInitialPruning |= pruneinfo->needs_init_pruning;
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index f8808d2191..9c1c7bfa9e 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,10 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
/*
* Now we can define the portal.
@@ -1987,6 +1991,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ if (cplan_extra)
+ PortalSaveCachedPlanExtra(portal, cplan_extra);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..32e6b7b767 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results;
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +124,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: pruning results returned by CachedPlanLockPartitions()
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +137,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +149,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +495,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan_extra == NULL ? NIL :
+ linitial(portal->cplan_extra->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1234,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1282,19 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->cplan_extra)
+ part_prune_results = list_nth_node(List,
+ portal->cplan_extra->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 339bb603f7..7bd94e7632 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -59,6 +59,7 @@
#include "access/transam.h"
#include "catalog/namespace.h"
#include "executor/executor.h"
+#include "executor/execPartition.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/optimizer.h"
@@ -96,17 +97,20 @@ static dlist_head saved_plan_list = DLIST_STATIC_INIT(saved_plan_list);
*/
static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_list);
+static CachedPlan *GetCachedPlanInternal(CachedPlanSource *plansource,
+ ParamListInfo boundParams, ResourceOwner owner,
+ QueryEnvironment *queryEnv, bool *hasUnlockedParts);
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, bool *hasUnlockedParts);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static bool AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -783,16 +787,23 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid and
+ * set *hasUnlockedParts if any PlannedStmt contains "initially" prunable
+ * subnodes; partitions are not locked till initial pruning is done.
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
+ * On a "true" return, we have acquired the minimal set of locks needed to run
+ * the plan, that is, excluding partitions that are subject to being pruned
+ * before execution. The caller must lock partitions after pruning those and
+ * locking the ones that remain before actually telling the world that the
+ * plan is "valid".
+ *
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, bool *hasUnlockedParts)
{
CachedPlan *plan = plansource->gplan;
@@ -826,7 +837,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ *hasUnlockedParts = AcquireExecutorLocks(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +859,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ (void) AcquireExecutorLocks(plan->stmt_list, false);
}
/*
@@ -1120,7 +1131,125 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
}
/*
- * GetCachedPlan: get a cached plan from a CachedPlanSource.
+ * For each PlannedStmt in plan->stmt_list, do initial partition pruning if
+ * needed and lock partitions that survive.
+ *
+ * The returned list of the same length as plan->stmt_list will contains either
+ * a NIL if the PlannedStmt did not contain any PartitionPruneInfos requiring
+ * initial pruning or a List of PartitionPruneResult that in turn contains
+ * an element for each PartitionPruneInfo found in stmt->partPruneInfos.
+ *
+ * Also, on return, *lockedRelids_per_stmt, that will be made of the same
+ * length as plan->stmt_list, will contain either a NULL if no additional
+ * relations needed to be locked for the PlannedStmt, or a bitmapset of RT
+ * indexes of partitions locked.
+ */
+static bool
+CachedPlanLockPartitions(CachedPlan *plan,
+ ParamListInfo boundParams,
+ ResourceOwner owner,
+ List **part_prune_results_list,
+ List **lockedRelids_per_stmt)
+{
+ List *my_part_prune_results_list = NIL;
+ List *my_lockedRelids_per_stmt = NIL;
+ ListCell *lc1;
+ MemoryContext oldcontext,
+ tmpcontext;
+
+ *part_prune_results_list = NIL;
+ *lockedRelids_per_stmt = NIL;
+
+ /*
+ * Create a temporary context for memory allocations required while
+ * executing partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlanLockPartitions() working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+ foreach(lc1, plan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockPartRelids = NULL;
+ int rti;
+ List *part_prune_results = NIL;
+ Bitmapset *lockedRelids = NULL;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, because AcquireExecutorLocks on the
+ * parent CachedPlan would have dealt with these. Though, do let
+ * the caller know that no pruning is applicable to this statement.
+ */
+ my_part_prune_results_list = lappend(my_part_prune_results_list,
+ NIL);
+ *lockedRelids_per_stmt = lappend(*lockedRelids_per_stmt, NULL);
+ continue;
+ }
+
+ /* Figure out the partitions that would need to be locked. */
+ if (plannedstmt->containsInitialPruning)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc2);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->root_parent_relids =
+ bms_copy(pruneinfo->root_parent_relids);
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, boundParams,
+ pruneinfo,
+ &lockPartRelids);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+ }
+
+ rti = -1;
+ while ((rti = bms_next_member(lockPartRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ my_part_prune_results_list = lappend(my_part_prune_results_list,
+ part_prune_results);
+ my_lockedRelids_per_stmt = lappend(my_lockedRelids_per_stmt,
+ lockedRelids);
+ }
+
+ /*
+ * If the plan is still valid, copy the prune results and lockRelids
+ * bitmapsets into the caller's context.
+ */
+ MemoryContextSwitchTo(oldcontext);
+ if (plan->is_valid)
+ {
+ *part_prune_results_list = copyObject(my_part_prune_results_list);
+ *lockedRelids_per_stmt = copyObject(my_lockedRelids_per_stmt);
+ }
+
+ /* Clear up the temporary context. */
+ MemoryContextDelete(tmpcontext);
+ return plan->is_valid;
+}
+
+/*
+ * GetCachedPlan: get a cached plan from a CachedPlanSource
*
* This function hides the logic that decides whether to use a generic
* plan or a custom plan for the given parameters: the caller does not know
@@ -1139,7 +1268,97 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanExtra **extra)
+{
+ CachedPlan *plan;
+
+ Assert(extra != NULL);
+ *extra = NULL;
+ for (;;)
+ {
+ bool hasUnlockedParts = false;
+
+ /* Actually get the plan. */
+ plan = GetCachedPlanInternal(plansource, boundParams, owner, queryEnv,
+ &hasUnlockedParts);
+ Assert(plan->is_valid);
+
+ /* Nothing to do if all relations already locked. */
+ if (!hasUnlockedParts)
+ return plan;
+ else
+ {
+ /*
+ * Do initial pruning to filter out partitions that need not be
+ * locked for execution.
+ */
+ ListCell *lc1,
+ *lc2;
+ List *part_prune_results_list;
+ List *lockedRelids_per_stmt;
+
+ /* Only a generic plan can ever have unlocked partitions in it. */
+ Assert(plan == plansource->gplan);
+
+ /*
+ * This does:
+ *
+ * 1) the pruning, returning in part_prune_results_list the
+ * PartitionPruneResult Lists for all statements
+ *
+ * 2) lock partitions that survive in each statement, returning
+ * in lockedRelids_per_stmt the RT indexes of those locked.
+ *
+ * True is returned if the plan is still valid after locking all
+ * partitions; false otherwise, in which case we must get a new
+ * plan.
+ */
+ if (CachedPlanLockPartitions(plan, boundParams, owner,
+ &part_prune_results_list,
+ &lockedRelids_per_stmt))
+ {
+ Assert(plan->is_valid);
+ *extra = (CachedPlanExtra *) palloc(sizeof(CachedPlanExtra));
+ (*extra)->part_prune_results_list = part_prune_results_list;
+ return plan;
+ }
+
+ /*
+ * Release the locks and start over. This is the same as what
+ * CheckCachedPlan does when doing AcquireExecutorLocks() causes
+ * the plan to be invalidated.
+ */
+ forboth(lc1, plan->stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst(lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue;
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ }
+ }
+
+ Assert(false);
+ return NULL;
+}
+
+/* Internal workhorse of GetCachedPlan() */
+static CachedPlan *
+GetCachedPlanInternal(CachedPlanSource *plansource, ParamListInfo boundParams,
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ bool *hasUnlockedParts)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1160,7 +1379,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (CheckCachedPlan(plansource, hasUnlockedParts))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1738,11 +1957,16 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
+ *
+ * If some PlannedStmt(s) contain "initially prunable" partitions, they are not
+ * locked here. Instead, the caller is informed of their existence so that it
+ * can lock them after doing the initial pruning.
*/
-static void
+static bool
AcquireExecutorLocks(List *stmt_list, bool acquire)
{
ListCell *lc1;
+ bool hasUnlockedParts = false;
foreach(lc1, stmt_list)
{
@@ -1763,10 +1987,17 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Assert(plannedstmt->minLockRelids == NULL);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
continue;
}
+ /*
+ * If partitions can be pruned before execution, defer their locking to
+ * the caller.
+ */
+ if (plannedstmt->containsInitialPruning)
+ hasUnlockedParts = true;
+
allLockRelids = plannedstmt->minLockRelids;
rti = -1;
while ((rti = bms_next_member(allLockRelids, rti)) > 0)
@@ -1788,6 +2019,8 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
+
+ return hasUnlockedParts;
}
/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..94a9db84e3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,22 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * Copies the given CachedPlanExtra struct into the portal.
+ */
+void
+PortalSaveCachedPlanExtra(Portal portal, CachedPlanExtra *extra)
+{
+ MemoryContext oldcxt = MemoryContextSwitchTo(portal->portalContext);
+
+ Assert(portal->cplan_extra == NULL && extra != NULL);
+ portal->cplan_extra = (CachedPlanExtra *)
+ palloc(sizeof(CachedPlanExtra));
+ portal->cplan_extra->part_prune_results_list =
+ copyObject(extra->part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index aeeaeb7884..4b98d0d2ef 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -129,5 +129,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..5a7d075750 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* PartitionPruneResults returned by
+ * CachedPlanLockPartitions() */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9a64a830a2..f1374057e5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -617,6 +617,7 @@ typedef struct EState
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4337e7aa34..10f12e780e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -134,8 +134,8 @@ typedef struct PlannerGlobal
bool containsInitialPruning;
/*
- * Indexes of all range table entries; for AcquireExecutorLocks()'s
- * perusal.
+ * Indexes of all range table entries except those of leaf partitions
+ * scanned by prunable subplans; for AcquireExecutorLocks() perusal.
*/
Bitmapset *minLockRelids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index eb0a007946..ab8bc74e4a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -82,7 +82,9 @@ typedef struct PlannedStmt
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
- Bitmapset *minLockRelids; /* Indexes of all range table entries; for
+ Bitmapset *minLockRelids; /* Indexes of all range table entries except
+ * those of leaf partitions scanned by
+ * prunable subplans; for
* AcquireExecutorLocks()'s perusal */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
@@ -1575,6 +1577,33 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * root_parent_relids is same as PartitionPruneInfo.root_parent_relids. It's
+ * there for cross-checking in ExecInitPartitionPruning() that the
+ * PartitionPruneResult and the PartitionPruneInfo at a given index in
+ * EState.es_part_prune_results and EState.es_part_prune_infos, respectively,
+ * belong to the same parent plan node.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started, such as in
+ * CachedPlanLockPartitions().
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *root_parent_relids;
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..4ac66d2761 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -160,6 +160,14 @@ typedef struct CachedPlan
MemoryContext context; /* context containing this CachedPlan */
} CachedPlan;
+/*
+ * Additional information to pass the executor when executing a CachedPlan.
+ */
+typedef struct CachedPlanExtra
+{
+ List *part_prune_results_list;
+} CachedPlanExtra;
+
/*
* CachedExpression is a low-overhead mechanism for caching the planned form
* of standalone scalar expressions. While such expressions are not usually
@@ -220,7 +228,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanExtra **extra);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..49bb00cda5 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanExtra *cplan_extra; /* CachedPlanExtra for cplan in Portal's
+ * memory */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +244,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalSaveCachedPlanExtra(Portal portal, CachedPlanExtra *extra);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-16 02:33 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2022-12-16 02:33 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Wed, Dec 14, 2022 at 5:35 PM Amit Langote <[email protected]> wrote:
> I have moved the original functionality of GetCachedPlan() to
> GetCachedPlanInternal(), turning the former into a sort of controller
> as described shortly. The latter's CheckCachedPlan() part now only
> locks the "minimal" set of, non-prunable, relations, making a note of
> whether the plan contains any prunable subnodes and thus prunable
> relations whose locking is deferred to the caller, GetCachedPlan().
> GetCachedPlan(), as a sort of controller as mentioned before, does the
> pruning if needed on the minimally valid plan returned by
> GetCachedPlanInternal(), locks the partitions that survive, and redoes
> the whole thing if the locking of partitions invalidates the plan.
After sleeping on it, I realized this doesn't have to be that
complicated. Rather than turn GetCachedPlan() into a wrapper for
handling deferred partition locking as outlined above, I could have
changed it more simply as follows to get the same thing done:
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ bool hasUnlockedParts = false;
+
+ if (CheckCachedPlan(plansource, &hasUnlockedParts) &&
+ hasUnlockedParts &&
+ CachedPlanLockPartitions(plansource, boundParams, owner, extra))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
Attached updated patch does it like that.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v30-0002-In-GetCachedPlan-only-lock-unpruned-partitions.patch (66.2K, 2-v30-0002-In-GetCachedPlan-only-lock-unpruned-partitions.patch)
download | inline diff:
From 4176843628ef29c1ff173ad0dfbdd13f7d07c225 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Wed, 22 Dec 2021 16:55:17 +0900
Subject: [PATCH v30 2/2] In GetCachedPlan(), only lock unpruned partitions
This does two things mainly:
* The planner now removes the RT indexes of "initially prunable"
partitions from PlannedStmt.minLockRelids such that the set only
contains the relations not subject to initial partition pruning. So,
AcquireExecutorLocks only locks a subset of the relations contained
in a plan, deferring the locking of prunable relations to the caller.
* GetCachedPlans(), if there are prunable relations in the plan,
performs the initial partition pruning using available EXTERN params
and locks the partitions remaining after that, so the the CachedPlan
that's returned is valid in a race-free manner including for any
partitions that will be scanned during execution.
To make the pruning possible before entering ExecutorStart(), this
also adds a ExecPartitionDoInitialPruning(), which can be called by
GetCachedPlan() for a given PlannedStmt.
The result of performing initial partition pruning this way is made
available to the actual execution via PartitionPruneResult, of which
there is one for every ParttionPruneInfo contained in the PlannedStmt.
List of PartitionPruneResult for a given PlannedStmt are returned to
to the callers of GetCachedPlan() via its new output parameter of type
CachedPlanExtra, whose members currently only include said List.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/prepare.c | 28 +++-
src/backend/executor/README | 31 +++-
src/backend/executor/execMain.c | 2 +
src/backend/executor/execParallel.c | 25 ++-
src/backend/executor/execPartition.c | 215 +++++++++++++++++++++----
src/backend/executor/execUtils.c | 1 +
src/backend/executor/functions.c | 2 +-
src/backend/executor/nodeAppend.c | 11 +-
src/backend/executor/nodeMergeAppend.c | 5 +-
src/backend/executor/spi.c | 31 +++-
src/backend/optimizer/plan/setrefs.c | 36 +++++
src/backend/tcop/postgres.c | 9 +-
src/backend/tcop/pquery.c | 28 +++-
src/backend/utils/cache/plancache.c | 204 +++++++++++++++++++++--
src/backend/utils/mmgr/portalmem.c | 16 ++
src/include/commands/explain.h | 4 +-
src/include/executor/execPartition.h | 7 +-
src/include/executor/execdesc.h | 3 +
src/include/nodes/execnodes.h | 1 +
src/include/nodes/pathnodes.h | 4 +-
src/include/nodes/plannodes.h | 31 +++-
src/include/utils/plancache.h | 11 +-
src/include/utils/portal.h | 3 +
28 files changed, 640 insertions(+), 83 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f26cc0d162..401a2280a3 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -558,7 +558,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 152c29b551..942449544c 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -325,7 +325,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NIL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index f86983c660..2f2b558608 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -407,7 +407,7 @@ ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NIL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL));
}
}
@@ -515,7 +515,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, List *part_prune_results,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage)
@@ -563,7 +564,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, part_prune_results, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index cf1b1ca571..904cbcba4a 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -779,7 +779,7 @@ execute_sql_string(const char *sql)
{
QueryDesc *qdesc;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, NIL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 8ba2436a71..049a90f49d 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -409,7 +409,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NIL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 9e29584d93..729384a9a6 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,11 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
plan_list = cplan->stmt_list;
/*
@@ -207,6 +212,9 @@ ExecuteQuery(ParseState *pstate,
plan_list,
cplan);
+ if (cplan_extra)
+ PortalSaveCachedPlanExtra(portal, cplan_extra);
+
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
* statement is one that produces tuples. Currently we insist that it be
@@ -575,6 +583,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -619,7 +628,11 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Replan if needed, and acquire a transient refcount */
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, queryEnv);
+ CurrentResourceOwner, queryEnv,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -637,10 +650,17 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ List *part_prune_results = NIL;
+
+ if (cplan_extra)
+ part_prune_results = list_nth_node(List,
+ cplan_extra->part_prune_results_list,
+ foreach_current_index(p));
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, queryEnv,
- &planduration, (es->buffers ? &bufusage : NULL));
+ ExplainOnePlan(pstmt, part_prune_results, into, es, query_string,
+ paramLI, queryEnv, &planduration,
+ (es->buffers ? &bufusage : NULL));
else
ExplainOneUtility(pstmt->utilityStmt, into, es, query_string,
paramLI, queryEnv);
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 17775a49e2..2222b3ed6f 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -63,7 +63,36 @@ if the executor determines that an entire subplan is not required due to
execution time partition pruning determining that no matching records will be
found there. This currently only occurs for Append and MergeAppend nodes. In
this case the non-required subplans are ignored and the executor state's
-subnode array will become out of sequence to the plan's subplan list.
+subnode array will become out of sequence to the plan's subplan list. Note
+that this is referred to as "initial" pruning, because it needs to occur only
+once during the execution startup, and uses a set of pruning steps called
+initial pruning steps (see PartitionedRelPruneInfo.initial_pruning_steps).
+
+Actually, "initial" pruning may occur even before the execution startup in
+in some cases. For example, when a cached generic plan is validated for
+execution, which works by locking all the relations that will be scanned by
+that plan during execution. If the generic plan contains plan nodes that have
+prunable child subnodes, then this validation locking is performed after
+pruning child subnodes that need not be scanned during execution, that is,
+using initial pruning steps. When such a generic plan is forwarded for
+execution, it must be accompanied by the set of PartitionPruneResult nodes that
+contain the result of that pruning, which basically consists of a bitmapset of
+child subnode indexes that survived the pruning and thus whose relations would
+have been locked for execution. This is important, because, unlike the
+plan-time pruning and actual executor-startup pruning, this does not actually
+remove the pruned subnodes from the plan tree, but only marks them as being
+pruned. So, the executor code (core or third party), especially one that runs
+before ExecutorStart() and thus looks at bare Plan trees (not PlanState trees)
+must beware of plan nodes that may actually have been pruned and thus subject
+to being invalidated by concurrent schema changes. For plan nodes that can
+have prunable child subnodes and thus contain a PartitionPruneInfo, such code
+must always check if the corresponding PartitionPruneResult exists
+in EState.es_part_prune_results at given part_prune_index and use that to
+decide which subplans are valid for execution instead of redoing the pruning.
+Note that that is not just a performance optimization but also necessary to
+avoid possibly ending up considering a different set of child subnodes as valid
+than the set CachedPlanLockPartitions() would have locked the relations of, if
+the pruning steps produce a different result when executed multiple times.
Each Plan node may have expression trees associated with it, to represent
its target list, qualification conditions, etc. These trees are also
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2c2b3a8874..229f61f72e 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -798,6 +798,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ List *part_prune_results = queryDesc->part_prune_results;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -819,6 +820,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
estate->es_plannedstmt = plannedstmt;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ estate->es_part_prune_results = part_prune_results;
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 65c4b63bbd..9745eba0af 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -66,6 +66,7 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -599,12 +600,15 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -633,6 +637,7 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -659,6 +664,11 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized List of PartitionPruneResult. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -753,6 +763,12 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized List of PartitionPruneResult */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS,
+ part_prune_results_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1234,8 +1250,10 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
ParamListInfo paramLI;
char *queryString;
@@ -1246,12 +1264,17 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
pstmtspace = shm_toc_lookup(toc, PARALLEL_KEY_PLANNEDSTMT, false);
pstmt = (PlannedStmt *) stringToNode(pstmtspace);
+ /* Reconstruct leader-supplied PartitionPruneResult. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+
/* Reconstruct ParamListInfo. */
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
/* Create a QueryDesc for the query. */
- return CreateQueryDesc(pstmt,
+ return CreateQueryDesc(pstmt, part_prune_results,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 5b62157712..dcd2bb0f90 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -25,6 +25,7 @@
#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
+#include "parser/parsetree.h"
#include "partitioning/partbounds.h"
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
@@ -185,7 +186,11 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(PlanState *planstate,
- PartitionPruneInfo *pruneinfo);
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -198,7 +203,8 @@ static void PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
static void find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans);
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis);
/*
@@ -1742,7 +1748,8 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* considered to be a stable expression, it can change value from one plan
* node scan to the next during query execution. Stable comparison
* expressions that don't involve such Params allow partition pruning to be
- * done once during executor startup. Expressions that do involve such Params
+ * done once during executor startup or even before that, such as when called
+ * from CachedPlanLockPartitions(). Expressions that do involve such Params
* require us to prune separately for each scan of the parent plan node.
*
* Note that pruning away unneeded subplans during executor startup has the
@@ -1760,6 +1767,12 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* account for initial pruning possibly having eliminated some of the
* subplans.
*
+ * ExecPartitionDoInitialPruning:
+ * Do initial pruning with the information contained in a given
+ * PartitionPruneInfo to determine the set of the parent plan node's
+ * child subnodes that are valid for execution and also the set of the RT
+ * indexes of leaf partitions scanned by those subnodes.
+ *
* ExecFindMatchingSubPlans:
* Returns indexes of matching subplans after evaluating the expressions
* that are safe to evaluate at a given point. This function is first
@@ -1780,8 +1793,10 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* On return, *initially_valid_subplans is assigned the set of indexes of
* child subplans that must be initialized along with the parent plan node.
- * Initial pruning is performed here if needed and in that case only the
- * surviving subplans' indexes are added.
+ * That set is computed by either performing the "initial pruning" here or
+ * reusing the one present in EState.es_part_prune_results[part_prune_index]
+ * if it has been set, which it would be if CachedPlanLockPartitions() would
+ * have done the initial pruning.
*
* If subplans are indeed pruned, subplan_map arrays contained in the returned
* PartitionPruneState are re-sequenced to not count those, though only if the
@@ -1794,9 +1809,10 @@ ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans)
{
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = NULL;
EState *estate = planstate->state;
PartitionPruneInfo *pruneinfo;
+ PartitionPruneResult *pruneresult = NULL;
/* Obtain the pruneinfo we need, and make sure it's the right one */
pruneinfo = list_nth(estate->es_part_prune_infos, part_prune_index);
@@ -1812,20 +1828,62 @@ ExecInitPartitionPruning(PlanState *planstate,
/* We may need an expression context to evaluate partition exprs */
ExecAssignExprContext(estate, planstate);
- /* Create the working data structure for pruning */
- prunestate = CreatePartitionPruneState(planstate, pruneinfo);
+ /* Initial pruning already done if es_part_prune_results has been set. */
+ if (estate->es_part_prune_results)
+ {
+ pruneresult = list_nth_node(PartitionPruneResult,
+ estate->es_part_prune_results,
+ part_prune_index);
+ if (!bms_equal(root_parent_relids, pruneinfo->root_parent_relids))
+ ereport(ERROR,
+ errcode(ERRCODE_INTERNAL_ERROR),
+ errmsg_internal("mismatching PartitionPruneInfo and PartitionPruneResult at part_prune_index %d",
+ part_prune_index),
+ errdetail_internal("prunresult relids %s, pruneinfo relids %s",
+ bmsToString(pruneresult->root_parent_relids),
+ bmsToString(pruneinfo->root_parent_relids)));
+ }
+
+ if (pruneresult == NULL || pruneinfo->needs_exec_pruning)
+ {
+ /* We may need an expression context to evaluate partition exprs */
+ ExecAssignExprContext(estate, planstate);
+
+ /* For data reading, executor always omits detached partitions */
+ if (estate->es_partition_directory == NULL)
+ estate->es_partition_directory =
+ CreatePartitionDirectory(estate->es_query_cxt, false);
+
+ /*
+ * Create the working data structure for pruning. No need to consider
+ * initial pruning steps if we have a PartitionPruneResult.
+ */
+ prunestate = CreatePartitionPruneState(planstate, pruneinfo,
+ pruneresult == NULL,
+ pruneinfo->needs_exec_pruning,
+ NIL, planstate->ps_ExprContext,
+ estate->es_partition_directory);
+ }
/*
* Perform an initial partition prune pass, if required.
*/
- if (prunestate->do_initial_prune)
- *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true);
+ if (pruneresult)
+ {
+ *initially_valid_subplans = bms_copy(pruneresult->valid_subplan_offs);
+ }
+ else if (prunestate && prunestate->do_initial_prune)
+ {
+ *initially_valid_subplans = ExecFindMatchingSubPlans(prunestate, true,
+ NULL);
+ }
else
{
- /* No pruning, so we'll need to initialize all subplans */
+ /* No initial pruning, so we'll need to initialize all subplans */
Assert(n_total_subplans > 0);
*initially_valid_subplans = bms_add_range(NULL, 0,
n_total_subplans - 1);
+ return prunestate;
}
/*
@@ -1833,7 +1891,8 @@ ExecInitPartitionPruning(PlanState *planstate,
* that were removed above due to initial pruning. No need to do this if
* no steps were removed.
*/
- if (bms_num_members(*initially_valid_subplans) < n_total_subplans)
+ if (prunestate &&
+ bms_num_members(*initially_valid_subplans) < n_total_subplans)
{
/*
* We can safely skip this when !do_exec_prune, even though that
@@ -1849,11 +1908,58 @@ ExecInitPartitionPruning(PlanState *planstate,
return prunestate;
}
+/*
+ * ExecPartitionDoInitialPruning
+ * Perform initial pruning using given PartitionPruneInfo to determine
+ * the set of the parent plan node's child subnodes that are valid for
+ * execution
+ *
+ * On return, *scan_leafpart_rtis will contain the RT indexes of leaf
+ * partitions scanned by those valid subnodes.
+ *
+ * Note that this does not share state with the actual execution, so must do
+ * with the information present in the PlannedStmt. For example, there isn't
+ * a PlanState for the parent plan node yet, so we must create a standalone
+ * ExprContext to evaluate pruning expressions, equipped with the information
+ * about the EXTERN parameters that we do have. Note that that's okay because
+ * the initial pruning steps do not contain anything that would require the
+ * execution to have started. Likewise, we create our own PartitionDirectory
+ * to look up the PartitionDescs to use.
+ */
+Bitmapset *
+ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt, ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis)
+{
+ List *rtable = plannedstmt->rtable;
+ ExprContext *econtext;
+ PartitionDirectory pdir;
+ PartitionPruneState *prunestate;
+ Bitmapset *valid_subplan_offs;
+
+ /* Don't omit detached partitions, just like during execution proper. */
+ pdir = CreatePartitionDirectory(CurrentMemoryContext, false);
+ econtext = CreateStandaloneExprContext();
+ econtext->ecxt_param_list_info = params;
+ prunestate = CreatePartitionPruneState(NULL, pruneinfo, true, false,
+ rtable, econtext, pdir);
+ valid_subplan_offs = ExecFindMatchingSubPlans(prunestate, true,
+ scan_leafpart_rtis);
+
+ FreeExprContext(econtext, true);
+ DestroyPartitionDirectory(pdir);
+
+ return valid_subplan_offs;
+}
+
/*
* CreatePartitionPruneState
* Build the data structure required for calling ExecFindMatchingSubPlans
*
- * 'planstate' is the parent plan node's execution state.
+ * 'planstate', if not NULL, is the parent plan node's execution state. It
+ * can be NULL if being called before ExecutorStart(), in which case,
+ * 'rtable' (range table), 'econtext', and 'partdir' must be explicitly
+ * provided.
*
* 'pruneinfo' is a PartitionPruneInfo as generated by
* make_partition_pruneinfo. Here we build a PartitionPruneState containing a
@@ -1867,19 +1973,21 @@ ExecInitPartitionPruning(PlanState *planstate,
* PartitionedRelPruneInfo.
*/
static PartitionPruneState *
-CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
+CreatePartitionPruneState(PlanState *planstate,
+ PartitionPruneInfo *pruneinfo,
+ bool consider_initial_steps,
+ bool consider_exec_steps,
+ List *rtable, ExprContext *econtext,
+ PartitionDirectory partdir)
{
- EState *estate = planstate->state;
+ EState *estate = planstate ? planstate->state : NULL;
PartitionPruneState *prunestate;
int n_part_hierarchies;
ListCell *lc;
int i;
- ExprContext *econtext = planstate->ps_ExprContext;
- /* For data reading, executor always omits detached partitions */
- if (estate->es_partition_directory == NULL)
- estate->es_partition_directory =
- CreatePartitionDirectory(estate->es_query_cxt, false);
+ Assert((estate != NULL) ||
+ (partdir != NULL && econtext != NULL && rtable != NIL));
n_part_hierarchies = list_length(pruneinfo->prune_infos);
Assert(n_part_hierarchies > 0);
@@ -1934,15 +2042,39 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
PartitionKey partkey;
/*
- * We can rely on the copies of the partitioned table's partition
- * key and partition descriptor appearing in its relcache entry,
- * because that entry will be held open and locked for the
- * duration of this executor run.
+ * Must open the relation by ourselves when called before the
+ * execution has started, such as, when called from
+ * CachedPlanLockPartitions(). In that case, sub-partitions must
+ * be locked, because AcquirePlannerLocks() would have locked only
+ * the root parent.
+ */
+ if (estate == NULL)
+ {
+ RangeTblEntry *rte = rt_fetch(pinfo->rtindex, rtable);
+ int lockmode = (j == 0) ? NoLock : rte->rellockmode;
+
+ partrel = table_open(rte->relid, lockmode);
+ }
+ else
+ partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+
+ /*
+ * We can rely on the copy of the partitioned table's partition
+ * key from in its relcache entry, because it can't change (or
+ * get destroyed) as long as the relation is locked. Partition
+ * descriptor is taken from the PartitionDirectory associated with
+ * the table that is held open long enough for the descriptor to
+ * remain valid while it's used to perform the pruning steps.
*/
- partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
partkey = RelationGetPartitionKey(partrel);
- partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
- partrel);
+ partdesc = PartitionDirectoryLookup(partdir, partrel);
+
+ /*
+ * Must close partrel, keeping the lock taken, if we're not using
+ * EState's entry.
+ */
+ if (estate == NULL)
+ table_close(partrel, NoLock);
/*
* Initialize the subplan_map and subpart_map.
@@ -2050,7 +2182,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
* Initialize pruning contexts as needed.
*/
pprune->initial_pruning_steps = pinfo->initial_pruning_steps;
- if (pinfo->initial_pruning_steps)
+ if (consider_initial_steps && pinfo->initial_pruning_steps)
{
InitPartitionPruneContext(&pprune->initial_context,
pinfo->initial_pruning_steps,
@@ -2060,7 +2192,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
prunestate->do_initial_prune = true;
}
pprune->exec_pruning_steps = pinfo->exec_pruning_steps;
- if (pinfo->exec_pruning_steps)
+ if (consider_exec_steps && pinfo->exec_pruning_steps)
{
InitPartitionPruneContext(&pprune->exec_context,
pinfo->exec_pruning_steps,
@@ -2288,10 +2420,14 @@ PartitionPruneFixSubPlanMap(PartitionPruneState *prunestate,
* Pass initial_prune if PARAM_EXEC Params cannot yet be evaluated. This
* differentiates the initial executor-time pruning step from later
* runtime pruning.
+ *
+ * RT indexes of leaf partitions scanned by the chosen subplans are added to
+ * *scan_leafpart_rtis if the pointer is non-NULL.
*/
Bitmapset *
ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune)
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *result = NULL;
MemoryContext oldcontext;
@@ -2326,7 +2462,7 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
*/
pprune = &prunedata->partrelprunedata[0];
find_matching_subplans_recurse(prunedata, pprune, initial_prune,
- &result);
+ &result, scan_leafpart_rtis);
/* Expression eval may have used space in ExprContext too */
if (pprune->exec_pruning_steps)
@@ -2340,6 +2476,8 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
/* Copy result out of the temp context before we reset it */
result = bms_copy(result);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_copy(*scan_leafpart_rtis);
MemoryContextReset(prunestate->prune_context);
@@ -2350,13 +2488,15 @@ ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
* find_matching_subplans_recurse
* Recursive worker function for ExecFindMatchingSubPlans
*
- * Adds valid (non-prunable) subplan IDs to *validsubplans
+ * Adds valid (non-prunable) subplan IDs to *validsubplans and RT indexes of
+ * of the corresponding leaf partitions to *scan_leafpart_rtis (if asked for).
*/
static void
find_matching_subplans_recurse(PartitionPruningData *prunedata,
PartitionedRelPruningData *pprune,
bool initial_prune,
- Bitmapset **validsubplans)
+ Bitmapset **validsubplans,
+ Bitmapset **scan_leafpart_rtis)
{
Bitmapset *partset;
int i;
@@ -2383,8 +2523,14 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
while ((i = bms_next_member(partset, i)) >= 0)
{
if (pprune->subplan_map[i] >= 0)
+ {
*validsubplans = bms_add_member(*validsubplans,
pprune->subplan_map[i]);
+ Assert(pprune->rti_map[i] > 0);
+ if (scan_leafpart_rtis)
+ *scan_leafpart_rtis = bms_add_member(*scan_leafpart_rtis,
+ pprune->rti_map[i]);
+ }
else
{
int partidx = pprune->subpart_map[i];
@@ -2392,7 +2538,8 @@ find_matching_subplans_recurse(PartitionPruningData *prunedata,
if (partidx >= 0)
find_matching_subplans_recurse(prunedata,
&prunedata->partrelprunedata[partidx],
- initial_prune, validsubplans);
+ initial_prune, validsubplans,
+ scan_leafpart_rtis);
else
{
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 87f4d53ca7..7d36c972d3 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -139,6 +139,7 @@ CreateExecutorState(void)
estate->es_param_exec_vals = NULL;
estate->es_queryEnv = NULL;
+ estate->es_part_prune_results = NIL;
estate->es_query_cxt = qcontext;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index dc13625171..bffb42ce71 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -842,7 +842,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
- es->qd = CreateQueryDesc(es->stmt,
+ es->qd = CreateQueryDesc(es->stmt, NIL,
fcache->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 99830198bd..3b917584de 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -156,7 +156,8 @@ ExecInitAppend(Append *node, EState *estate, int eflags)
* subplan, we can fill as_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (appendstate->as_prune_state == NULL ||
+ (!appendstate->as_prune_state->do_exec_prune && nplans > 0))
appendstate->as_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -578,7 +579,7 @@ choose_next_subplan_locally(AppendState *node)
}
else if (node->as_valid_subplans == NULL)
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
whichplan = -1;
}
@@ -643,7 +644,7 @@ choose_next_subplan_for_leader(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
/*
* Mark each invalid plan as finished to allow the loop below to
@@ -718,7 +719,7 @@ choose_next_subplan_for_worker(AppendState *node)
else if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
mark_invalid_subplans_as_finished(node);
}
@@ -869,7 +870,7 @@ ExecAppendAsyncBegin(AppendState *node)
if (node->as_valid_subplans == NULL)
{
node->as_valid_subplans =
- ExecFindMatchingSubPlans(node->as_prune_state, false);
+ ExecFindMatchingSubPlans(node->as_prune_state, false, NULL);
classify_matching_subplans(node);
}
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index f370f9f287..ccfa083945 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -104,7 +104,8 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
* subplan, we can fill ms_valid_subplans immediately, preventing
* later calls to ExecFindMatchingSubPlans.
*/
- if (!prunestate->do_exec_prune && nplans > 0)
+ if (mergestate->ms_prune_state == NULL ||
+ (!mergestate->ms_prune_state->do_exec_prune && nplans > 0))
mergestate->ms_valid_subplans = bms_add_range(NULL, 0, nplans - 1);
}
else
@@ -219,7 +220,7 @@ ExecMergeAppend(PlanState *pstate)
*/
if (node->ms_valid_subplans == NULL)
node->ms_valid_subplans =
- ExecFindMatchingSubPlans(node->ms_prune_state, false);
+ ExecFindMatchingSubPlans(node->ms_prune_state, false, NULL);
/*
* First time through: pull the first tuple from each valid subplan,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index fd5796f1b9..2ecb9193aa 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1577,6 +1577,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra;
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1657,7 +1658,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,6 +1690,9 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
stmt_list,
cplan);
+ if (cplan_extra)
+ PortalSaveCachedPlanExtra(portal, cplan_extra);
+
/*
* Set up options for portal. Default SCROLL type is chosen the same way
* as PerformCursorOpen does it.
@@ -2067,6 +2075,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2092,8 +2101,12 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cplan_extra);
Assert(cplan == plansource->gplan);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
@@ -2399,6 +2412,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
CachedPlan *cplan = NULL;
+ CachedPlanExtra *cplan_extra = NULL;
ListCell *lc1;
/*
@@ -2549,8 +2563,12 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* plan, the refcount must be backed by the plan_owner.
*/
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
stmt_list = cplan->stmt_list;
/*
@@ -2592,9 +2610,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ List *part_prune_results = NIL;
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
+ if (cplan_extra)
+ part_prune_results = list_nth_node(List,
+ cplan_extra->part_prune_results_list,
+ foreach_current_index(lc2));
/*
* Reset output state. (Note that if a non-SPI receiver is used,
* _SPI_current->processed will stay zero, and that's what we'll
@@ -2663,7 +2686,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
else
snap = InvalidSnapshot;
- qdesc = CreateQueryDesc(stmt,
+ qdesc = CreateQueryDesc(stmt, part_prune_results,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ed43d5936d..db27cae297 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -372,6 +372,7 @@ set_plan_references(PlannerInfo *root, Plan *plan)
{
PartitionPruneInfo *pruneinfo = lfirst(lc);
ListCell *l;
+ Bitmapset *leafpart_rtis = NULL;
pruneinfo->root_parent_relids =
offset_relid_set(pruneinfo->root_parent_relids, rtoffset);
@@ -383,17 +384,52 @@ set_plan_references(PlannerInfo *root, Plan *plan)
foreach(l2, prune_infos)
{
PartitionedRelPruneInfo *pinfo = lfirst(l2);
+ int i;
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
+
+ /* Also of the leaf partitions that might be scanned. */
+ for (i = 0; i < pinfo->nparts; i++)
+ {
+ if (pinfo->rti_map[i] > 0 && pinfo->subplan_map[i] >= 0)
+ {
+ pinfo->rti_map[i] += rtoffset;
+ leafpart_rtis = bms_add_member(leafpart_rtis,
+ pinfo->rti_map[i]);
+ }
+ }
}
}
+ if (pruneinfo->needs_init_pruning)
+ {
+ glob->containsInitialPruning = true;
+
+ /*
+ * Delete the leaf partition RTIs from the set of relations to be
+ * locked by AcquireExecutorLocks(). The actual set of leaf
+ * partitions to be locked is computed by
+ * CachedPlanLockPartitions().
+ */
+ glob->minLockRelids = bms_del_members(glob->minLockRelids,
+ leafpart_rtis);
+ }
+
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
glob->containsInitialPruning |= pruneinfo->needs_init_pruning;
}
+ /*
+ * It seems worth doing a bms_copy() on glob->minLockRelids if we deleted
+ * bits from it above to get rid of any empty tail bits. It seems better
+ * for the loop over this set in AcquireExecutorLocks() to not have to go
+ * through those useless bit words.
+ */
+ if (glob->containsInitialPruning)
+ glob->minLockRelids = bms_copy(glob->minLockRelids);
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 01d264b5ab..e11e07658d 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1598,6 +1598,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanExtra *cplan_extra = NULL;
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -1972,7 +1973,10 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cplan_extra);
+ Assert(cplan_extra == NULL ||
+ (list_length(cplan->stmt_list) ==
+ list_length(cplan_extra->part_prune_results_list)));
/*
* Now we can define the portal.
@@ -1987,6 +1991,9 @@ exec_bind_message(StringInfo input_message)
cplan->stmt_list,
cplan);
+ if (cplan_extra)
+ PortalSaveCachedPlanExtra(portal, cplan_extra);
+
/* Done with the snapshot used for parameter I/O and parsing/planning */
if (snapshot_set)
PopActiveSnapshot();
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 52e2db6452..32e6b7b767 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -35,7 +35,7 @@
Portal ActivePortal = NULL;
-static void ProcessQuery(PlannedStmt *plan,
+static void ProcessQuery(PlannedStmt *plan, List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -65,6 +65,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -77,6 +78,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->part_prune_results = part_prune_results;
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -122,6 +124,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * part_prune_results: pruning results returned by CachedPlanLockPartitions()
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -134,6 +137,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ List *part_prune_results,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -145,7 +149,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, part_prune_results, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -491,8 +495,13 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * There is no PartitionPruneResult unless the PlannedStmt is
+ * from a CachedPlan.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->cplan_extra == NULL ? NIL :
+ linitial(portal->cplan_extra->part_prune_results_list),
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1225,6 +1234,8 @@ PortalRunMulti(Portal portal,
if (pstmt->utilityStmt == NULL)
{
+ List *part_prune_results = NIL;
+
/*
* process a plannable query.
*/
@@ -1271,10 +1282,19 @@ PortalRunMulti(Portal portal,
else
UpdateActiveSnapshotCommandId();
+ /*
+ * Determine if there's a corresponding List of PartitionPruneResult
+ * for this PlannedStmt.
+ */
+ if (portal->cplan_extra)
+ part_prune_results = list_nth_node(List,
+ portal->cplan_extra->part_prune_results_list,
+ foreach_current_index(stmtlist_item));
+
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1283,7 +1303,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, part_prune_results,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 339bb603f7..16b9869fae 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -59,6 +59,7 @@
#include "access/transam.h"
#include "catalog/namespace.h"
#include "executor/executor.h"
+#include "executor/execPartition.h"
#include "miscadmin.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/optimizer.h"
@@ -99,14 +100,18 @@ static dlist_head cached_expression_list = DLIST_STATIC_INIT(cached_expression_l
static void ReleaseGenericPlan(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, bool *hasUnlockedParts);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static bool AcquireExecutorLocks(List *stmt_list, bool acquire);
+static bool CachedPlanLockPartitions(CachedPlanSource *plansource,
+ ParamListInfo boundParams,
+ ResourceOwner owner,
+ CachedPlanExtra **extra);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -783,16 +788,23 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid and
+ * set *hasUnlockedParts if any PlannedStmt contains "initially" prunable
+ * subnodes; partitions are not locked till initial pruning is done.
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
+ * On a "true" return, we have acquired the minimal set of locks needed to run
+ * the plan, that is, excluding partitions that are subject to being pruned
+ * before execution. The caller must lock partitions after pruning those and
+ * locking the ones that remain before actually telling the world that the
+ * plan is "valid".
+ *
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, bool *hasUnlockedParts)
{
CachedPlan *plan = plansource->gplan;
@@ -826,7 +838,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ *hasUnlockedParts = AcquireExecutorLocks(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -848,7 +860,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ (void) AcquireExecutorLocks(plan->stmt_list, false);
}
/*
@@ -1120,14 +1132,17 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
}
/*
- * GetCachedPlan: get a cached plan from a CachedPlanSource.
+ * GetCachedPlan: get a cached plan from a CachedPlanSource
*
* This function hides the logic that decides whether to use a generic
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
* On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * execution. If the plan is a generic plan containing prunable partitions,
+ * the locks on partitions are taken after the pruning and the result of that
+ * pruning is saved in *extra->part_prune_results_list for the caller to pass
+ * to the executor, along with plan->stmt_list.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1139,12 +1154,16 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanExtra **extra)
{
CachedPlan *plan = NULL;
List *qlist;
bool customplan;
+ Assert(extra != NULL);
+ *extra = NULL;
+
/* Assert caller is doing things in a sane order */
Assert(plansource->magic == CACHEDPLANSOURCE_MAGIC);
Assert(plansource->is_complete);
@@ -1160,7 +1179,11 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ bool hasUnlockedParts = false;
+
+ if (CheckCachedPlan(plansource, &hasUnlockedParts) &&
+ hasUnlockedParts &&
+ CachedPlanLockPartitions(plansource, boundParams, owner, extra))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1282,6 +1305,147 @@ ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner)
}
}
+/*
+ * For each PlannedStmt in the generic plan, do the "initial" partition pruning
+ * if needed and lock only partitions that survive.
+ *
+ * On return, (*extra)->part_prune_results_list will contain an element for
+ * each PlannedStmt in the generic plan's stmt_list, which is a NIL if the
+ * PlannedStmt does not contain any PartitionPruneInfos requiring initial
+ * pruning or a List of PartitionPruneResult containing elements corresponding
+ * to the PartitionPruneInfos in PlannedStmt.partPruneInfos.
+ */
+static bool
+CachedPlanLockPartitions(CachedPlanSource *plansource,
+ ParamListInfo boundParams,
+ ResourceOwner owner,
+ CachedPlanExtra **extra)
+{
+ CachedPlan *plan = plansource->gplan;
+ List *part_prune_results_list = NIL;
+ List *lockedRelids_per_stmt = NIL;
+ ListCell *lc1,
+ *lc2;
+ MemoryContext oldcontext,
+ tmpcontext;
+
+ /*
+ * Won't be here without CheckCachedPlan() having validated a generic
+ * plan.
+ */
+ Assert(plansource->gplan != NULL);
+
+ /*
+ * Create a temporary context for memory allocations required while
+ * executing partition pruning steps.
+ */
+ tmpcontext = AllocSetContextCreate(CurrentMemoryContext,
+ "CachedPlanLockPartitions() working data",
+ ALLOCSET_DEFAULT_SIZES);
+ oldcontext = MemoryContextSwitchTo(tmpcontext);
+ foreach(lc1, plan->stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ Bitmapset *lockPartRelids = NULL;
+ int rti;
+ List *part_prune_results = NIL;
+ Bitmapset *lockedRelids = NULL;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /*
+ * Ignore utility statements, because AcquireExecutorLocks on the
+ * parent CachedPlan would have dealt with these. Though, do let
+ * the caller know that no pruning is applicable to this statement.
+ */
+ part_prune_results_list = lappend(part_prune_results_list, NIL);
+ lockedRelids_per_stmt = lappend(lockedRelids_per_stmt, NULL);
+ continue;
+ }
+
+ /* Figure out the partitions that would need to be locked. */
+ if (plannedstmt->containsInitialPruning)
+ {
+ foreach(lc2, plannedstmt->partPruneInfos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc2);
+ PartitionPruneResult *pruneresult = makeNode(PartitionPruneResult);
+
+ pruneresult->root_parent_relids =
+ bms_copy(pruneinfo->root_parent_relids);
+ pruneresult->valid_subplan_offs =
+ ExecPartitionDoInitialPruning(plannedstmt, boundParams,
+ pruneinfo,
+ &lockPartRelids);
+ part_prune_results = lappend(part_prune_results, pruneresult);
+ }
+ }
+
+ /* Lock 'em. */
+ rti = -1;
+ while ((rti = bms_next_member(lockPartRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ LockRelationOid(rte->relid, rte->rellockmode);
+ lockedRelids = bms_add_member(lockedRelids, rti);
+ }
+
+ part_prune_results_list = lappend(part_prune_results_list,
+ part_prune_results);
+ lockedRelids_per_stmt = lappend(lockedRelids_per_stmt,
+ lockedRelids);
+ }
+
+ /*
+ * If the plan is still valid, set *extra, returning in it a copy the
+ * pruning results obtained above allocated in the caller's context.
+ */
+ MemoryContextSwitchTo(oldcontext);
+ if (plan->is_valid)
+ {
+ *extra = (CachedPlanExtra *) palloc(sizeof(CachedPlanExtra));
+ (*extra)->part_prune_results_list = copyObject(part_prune_results_list);
+ }
+ else
+ {
+ /*
+ * Release the now useless locks. Note that this is the same as what
+ * CheckCachedPlan() does when the locks taken by
+ * AcquireExecutorLocks() causes the plan to be invalidated.
+ */
+ forboth(lc1, plan->stmt_list, lc2, lockedRelids_per_stmt)
+ {
+ PlannedStmt *plannedstmt = lfirst(lc1);
+ Bitmapset *lockedRelids = lfirst(lc2);
+ int rti;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ continue;
+ rti = -1;
+ while ((rti = bms_next_member(lockedRelids, rti)) > 0)
+ {
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
+
+ Assert(rte->rtekind == RTE_RELATION);
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+ }
+ }
+
+ /* Clear up the temporary context. */
+ MemoryContextDelete(tmpcontext);
+ return plan->is_valid;
+}
+
/*
* CachedPlanAllowsSimpleValidityCheck: can we use CachedPlanIsSimplyValid?
*
@@ -1738,11 +1902,16 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
+ *
+ * If some PlannedStmt(s) contain "initially prunable" partitions, they are not
+ * locked here. Instead, the caller is informed of their existence so that it
+ * can lock them after doing the initial pruning.
*/
-static void
+static bool
AcquireExecutorLocks(List *stmt_list, bool acquire)
{
ListCell *lc1;
+ bool hasUnlockedParts = false;
foreach(lc1, stmt_list)
{
@@ -1763,10 +1932,17 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
Assert(plannedstmt->minLockRelids == NULL);
if (query)
- ScanQueryForLocks(query, acquire);
+ ScanQueryForLocks(query, true);
continue;
}
+ /*
+ * If partitions can be pruned before execution, defer their locking to
+ * the caller.
+ */
+ if (plannedstmt->containsInitialPruning)
+ hasUnlockedParts = true;
+
allLockRelids = plannedstmt->minLockRelids;
rti = -1;
while ((rti = bms_next_member(allLockRelids, rti)) > 0)
@@ -1788,6 +1964,8 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
UnlockRelationOid(rte->relid, rte->rellockmode);
}
}
+
+ return hasUnlockedParts;
}
/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 7b1ae6fdcf..94a9db84e3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -303,6 +303,22 @@ PortalDefineQuery(Portal portal,
portal->status = PORTAL_DEFINED;
}
+/*
+ * Copies the given CachedPlanExtra struct into the portal.
+ */
+void
+PortalSaveCachedPlanExtra(Portal portal, CachedPlanExtra *extra)
+{
+ MemoryContext oldcxt = MemoryContextSwitchTo(portal->portalContext);
+
+ Assert(portal->cplan_extra == NULL && extra != NULL);
+ portal->cplan_extra = (CachedPlanExtra *)
+ palloc(sizeof(CachedPlanExtra));
+ portal->cplan_extra->part_prune_results_list =
+ copyObject(extra->part_prune_results_list);
+ MemoryContextSwitchTo(oldcxt);
+}
+
/*
* PortalReleaseCachedPlan
* Release a portal's reference to its cached plan, if any.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 9ebde089ae..269cc4d562 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -87,7 +87,9 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt,
+ List *part_prune_results,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index aeeaeb7884..4b98d0d2ef 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -129,5 +129,10 @@ extern PartitionPruneState *ExecInitPartitionPruning(PlanState *planstate,
Bitmapset *root_parent_relids,
Bitmapset **initially_valid_subplans);
extern Bitmapset *ExecFindMatchingSubPlans(PartitionPruneState *prunestate,
- bool initial_prune);
+ bool initial_prune,
+ Bitmapset **scan_leafpart_rtis);
+extern Bitmapset *ExecPartitionDoInitialPruning(PlannedStmt *plannedstmt,
+ ParamListInfo params,
+ PartitionPruneInfo *pruneinfo,
+ Bitmapset **scan_leafpart_rtis);
#endif /* EXECPARTITION_H */
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index e79e2c001f..5a7d075750 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,6 +35,8 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ List *part_prune_results; /* PartitionPruneResults returned by
+ * CachedPlanLockPartitions() */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +59,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ List *part_prune_results,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 9a64a830a2..f1374057e5 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -617,6 +617,7 @@ typedef struct EState
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
List *es_part_prune_infos; /* PlannedStmt.partPruneInfos */
+ List *es_part_prune_results; /* QueryDesc.part_prune_results */
const char *es_sourceText; /* Source text from QueryDesc */
JunkFilter *es_junkFilter; /* top-level junk filter, if any */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 4337e7aa34..10f12e780e 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -134,8 +134,8 @@ typedef struct PlannerGlobal
bool containsInitialPruning;
/*
- * Indexes of all range table entries; for AcquireExecutorLocks()'s
- * perusal.
+ * Indexes of all range table entries except those of leaf partitions
+ * scanned by prunable subplans; for AcquireExecutorLocks() perusal.
*/
Bitmapset *minLockRelids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index eb0a007946..ab8bc74e4a 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -82,7 +82,9 @@ typedef struct PlannedStmt
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
- Bitmapset *minLockRelids; /* Indexes of all range table entries; for
+ Bitmapset *minLockRelids; /* Indexes of all range table entries except
+ * those of leaf partitions scanned by
+ * prunable subplans; for
* AcquireExecutorLocks()'s perusal */
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
@@ -1575,6 +1577,33 @@ typedef struct PartitionPruneStepCombine
List *source_stepids;
} PartitionPruneStepCombine;
+/*----------------
+ * PartitionPruneResult
+ *
+ * The result of performing ExecPartitionDoInitialPruning() on a given
+ * PartitionPruneInfo.
+ *
+ * root_parent_relids is same as PartitionPruneInfo.root_parent_relids. It's
+ * there for cross-checking in ExecInitPartitionPruning() that the
+ * PartitionPruneResult and the PartitionPruneInfo at a given index in
+ * EState.es_part_prune_results and EState.es_part_prune_infos, respectively,
+ * belong to the same parent plan node.
+ *
+ * valid_subplans_offs contains the indexes of subplans remaining after
+ * performing initial pruning by calling ExecFindMatchingSubPlans() on the
+ * PartitionPruneInfo.
+ *
+ * This is used to store the result of initial partition pruning that is
+ * peformed before the execution has started, such as in
+ * CachedPlanLockPartitions().
+ */
+typedef struct PartitionPruneResult
+{
+ NodeTag type;
+
+ Bitmapset *root_parent_relids;
+ Bitmapset *valid_subplan_offs;
+} PartitionPruneResult;
/*
* Plan invalidation info
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 0499635f59..4ac66d2761 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -160,6 +160,14 @@ typedef struct CachedPlan
MemoryContext context; /* context containing this CachedPlan */
} CachedPlan;
+/*
+ * Additional information to pass the executor when executing a CachedPlan.
+ */
+typedef struct CachedPlanExtra
+{
+ List *part_prune_results_list;
+} CachedPlanExtra;
+
/*
* CachedExpression is a low-overhead mechanism for caching the planned form
* of standalone scalar expressions. While such expressions are not usually
@@ -220,7 +228,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanExtra **extra);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index aeddbdafe5..49bb00cda5 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,6 +138,8 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
+ CachedPlanExtra *cplan_extra; /* CachedPlanExtra for cplan in Portal's
+ * memory */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -242,6 +244,7 @@ extern void PortalDefineQuery(Portal portal,
CommandTag commandTag,
List *stmts,
CachedPlan *cplan);
+extern void PortalSaveCachedPlanExtra(Portal portal, CachedPlanExtra *extra);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.35.3
[application/octet-stream] v30-0001-Preparatory-refactoring-before-reworking-CachedP.patch (17.2K, 3-v30-0001-Preparatory-refactoring-before-reworking-CachedP.patch)
download | inline diff:
From 22c64b3d1ade0cb0f413c17d84a9bb0dd4e6d734 Mon Sep 17 00:00:00 2001
From: amitlan <[email protected]>
Date: Tue, 13 Dec 2022 11:58:07 +0900
Subject: [PATCH v30 1/2] Preparatory refactoring before reworking CachedPlan
locking
Remember the RT indexes of RTEs that AcquireExecutorLocks() must
look at to consider locking in a bitmapset, so that nstead of looping
over the range table to find those RTEs, it can look them up using
the RT indexes set in the bitmapset.
This also adds some extra information related to execution-time
pruning to the relevant plan nodes.
---
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 6 ++++
src/backend/nodes/readfuncs.c | 8 ++++--
src/backend/optimizer/plan/planner.c | 2 ++
src/backend/optimizer/plan/setrefs.c | 12 ++++++++
src/backend/partitioning/partprune.c | 42 ++++++++++++++++++++++++++--
src/backend/utils/cache/plancache.c | 10 +++++--
src/include/executor/execPartition.h | 2 ++
src/include/nodes/nodes.h | 1 +
src/include/nodes/pathnodes.h | 11 ++++++++
src/include/nodes/plannodes.h | 19 +++++++++++++
11 files changed, 106 insertions(+), 8 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index a5b8e43ec5..65c4b63bbd 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -182,6 +182,7 @@ ExecSerializePlan(Plan *plan, EState *estate)
pstmt->transientPlan = false;
pstmt->dependsOnRole = false;
pstmt->parallelModeNeeded = false;
+ pstmt->containsInitialPruning = false; /* workers need not know! */
pstmt->planTree = plan;
pstmt->partPruneInfos = estate->es_part_prune_infos;
pstmt->rtable = estate->es_range_table;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 76d79b9741..5b62157712 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1956,6 +1956,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
Assert(partdesc->nparts >= pinfo->nparts);
pprune->nparts = partdesc->nparts;
pprune->subplan_map = palloc(sizeof(int) * partdesc->nparts);
+ pprune->rti_map = palloc(sizeof(Index) * partdesc->nparts);
if (partdesc->nparts == pinfo->nparts)
{
/*
@@ -1966,6 +1967,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pprune->subpart_map = pinfo->subpart_map;
memcpy(pprune->subplan_map, pinfo->subplan_map,
sizeof(int) * pinfo->nparts);
+ memcpy(pprune->rti_map, pinfo->rti_map,
+ sizeof(int) * pinfo->nparts);
/*
* Double-check that the list of unpruned relations has not
@@ -2016,6 +2019,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
pinfo->subplan_map[pd_idx];
pprune->subpart_map[pp_idx] =
pinfo->subpart_map[pd_idx];
+ pprune->rti_map[pp_idx] =
+ pinfo->rti_map[pd_idx];
pd_idx++;
}
else
@@ -2023,6 +2028,7 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
/* this partdesc entry is not in the plan */
pprune->subplan_map[pp_idx] = -1;
pprune->subpart_map[pp_idx] = -1;
+ pprune->rti_map[pp_idx] = 0;
}
}
diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c
index 966b75f5a6..1161671fa4 100644
--- a/src/backend/nodes/readfuncs.c
+++ b/src/backend/nodes/readfuncs.c
@@ -158,6 +158,11 @@
token = pg_strtok(&length); /* skip :fldname */ \
local_node->fldname = readIntCols(len)
+/* Read an Index array */
+#define READ_INDEX_ARRAY(fldname, len) \
+ token = pg_strtok(&length); /* skip :fldname */ \
+ local_node->fldname = readIndexCols(len)
+
/* Read a bool array */
#define READ_BOOL_ARRAY(fldname, len) \
token = pg_strtok(&length); /* skip :fldname */ \
@@ -796,7 +801,6 @@ fnname(int numCols) \
*/
READ_SCALAR_ARRAY(readAttrNumberCols, int16, atoi)
READ_SCALAR_ARRAY(readOidCols, Oid, atooid)
-/* outfuncs.c has writeIndexCols, but we don't yet need that here */
-/* READ_SCALAR_ARRAY(readIndexCols, Index, atoui) */
+READ_SCALAR_ARRAY(readIndexCols, Index, atoui)
READ_SCALAR_ARRAY(readIntCols, int, atoi)
READ_SCALAR_ARRAY(readBoolCols, bool, strtobool)
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5dd4f92720..620b163ef9 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -523,8 +523,10 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->parallelModeNeeded = glob->parallelModeNeeded;
result->planTree = top_plan;
result->partPruneInfos = glob->partPruneInfos;
+ result->containsInitialPruning = glob->containsInitialPruning;
result->rtable = glob->finalrtable;
result->permInfos = glob->finalrteperminfos;
+ result->minLockRelids = glob->minLockRelids;
result->resultRelations = glob->resultRelations;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 596f1fbc8e..ed43d5936d 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -279,6 +279,16 @@ set_plan_references(PlannerInfo *root, Plan *plan)
*/
add_rtes_to_flat_rtable(root, false);
+ /*
+ * Add the query's adjusted range of RT indexes to glob->minLockRelids.
+ * The adjusted RT indexes of prunable relations will be deleted from the
+ * set below where PartitionPruneInfos are processed.
+ */
+ glob->minLockRelids =
+ bms_add_range(glob->minLockRelids,
+ rtoffset + 1,
+ rtoffset + list_length(root->parse->rtable));
+
/*
* Adjust RT indexes of PlanRowMarks and add to final rowmarks list
*/
@@ -377,9 +387,11 @@ set_plan_references(PlannerInfo *root, Plan *plan)
/* RT index of the table to which the pinfo belongs. */
pinfo->rtindex += rtoffset;
}
+
}
glob->partPruneInfos = lappend(glob->partPruneInfos, pruneinfo);
+ glob->containsInitialPruning |= pruneinfo->needs_init_pruning;
}
return result;
diff --git a/src/backend/partitioning/partprune.c b/src/backend/partitioning/partprune.c
index d48f6784c1..56270d7670 100644
--- a/src/backend/partitioning/partprune.c
+++ b/src/backend/partitioning/partprune.c
@@ -144,7 +144,9 @@ static List *make_partitionedrel_pruneinfo(PlannerInfo *root,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans);
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning);
static void gen_partprune_steps(RelOptInfo *rel, List *clauses,
PartClauseTarget target,
GeneratePruningStepsContext *context);
@@ -234,6 +236,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *relid_subplan_map;
ListCell *lc;
int i;
+ bool needs_init_pruning = false;
+ bool needs_exec_pruning = false;
/*
* Scan the subpaths to see which ones are scans of partition child
@@ -313,12 +317,16 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
Bitmapset *partrelids = (Bitmapset *) lfirst(lc);
List *pinfolist;
Bitmapset *matchedsubplans = NULL;
+ bool partrel_needs_init_pruning;
+ bool partrel_needs_exec_pruning;
pinfolist = make_partitionedrel_pruneinfo(root, parentrel,
prunequal,
partrelids,
relid_subplan_map,
- &matchedsubplans);
+ &matchedsubplans,
+ &partrel_needs_init_pruning,
+ &partrel_needs_exec_pruning);
/* When pruning is possible, record the matched subplans */
if (pinfolist != NIL)
@@ -327,6 +335,9 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
allmatchedsubplans = bms_join(matchedsubplans,
allmatchedsubplans);
}
+
+ needs_init_pruning |= partrel_needs_init_pruning;
+ needs_exec_pruning |= partrel_needs_exec_pruning;
}
pfree(relid_subplan_map);
@@ -342,6 +353,8 @@ make_partition_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pruneinfo = makeNode(PartitionPruneInfo);
pruneinfo->root_parent_relids = parentrel->relids;
pruneinfo->prune_infos = prunerelinfos;
+ pruneinfo->needs_init_pruning = needs_init_pruning;
+ pruneinfo->needs_exec_pruning = needs_exec_pruning;
/*
* Some subplans may not belong to any of the identified partitioned rels.
@@ -442,13 +455,19 @@ add_part_relids(List *allpartrelids, Bitmapset *partrelids)
* If we cannot find any useful run-time pruning steps, return NIL.
* However, on success, each rel identified in partrelids will have
* an element in the result list, even if some of them are useless.
+ * *needs_init_pruning and *needs_exec_pruning are set to indicate whether
+ * the pruning steps contained in the returned PartitionedRelPruneInfos
+ * can be performed during executor startup and during execution,
+ * respectively.
*/
static List *
make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
List *prunequal,
Bitmapset *partrelids,
int *relid_subplan_map,
- Bitmapset **matchedsubplans)
+ Bitmapset **matchedsubplans,
+ bool *needs_init_pruning,
+ bool *needs_exec_pruning)
{
RelOptInfo *targetpart = NULL;
List *pinfolist = NIL;
@@ -459,6 +478,10 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int rti;
int i;
+ /* Will find out below. */
+ *needs_init_pruning = false;
+ *needs_exec_pruning = false;
+
/*
* Examine each partitioned rel, constructing a temporary array to map
* from planner relids to index of the partitioned rel, and building a
@@ -546,6 +569,9 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
* executor per-scan pruning steps. This first pass creates startup
* pruning steps and detects whether there's any possibly-useful quals
* that would require per-scan pruning.
+ *
+ * In the first pass, we note whether the 2nd pass is necessary by
+ * noting the presence of EXEC parameters.
*/
gen_partprune_steps(subpart, partprunequal, PARTTARGET_INITIAL,
&context);
@@ -620,6 +646,12 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->execparamids = execparamids;
/* Remaining fields will be filled in the next loop */
+ /* record which types of pruning steps we've seen so far */
+ if (initial_pruning_steps != NIL)
+ *needs_init_pruning = true;
+ if (exec_pruning_steps != NIL)
+ *needs_exec_pruning = true;
+
pinfolist = lappend(pinfolist, pinfo);
}
@@ -647,6 +679,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
int *subplan_map;
int *subpart_map;
Oid *relid_map;
+ Index *rti_map;
/*
* Construct the subplan and subpart maps for this partitioning level.
@@ -659,6 +692,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subpart_map = (int *) palloc(nparts * sizeof(int));
memset(subpart_map, -1, nparts * sizeof(int));
relid_map = (Oid *) palloc0(nparts * sizeof(Oid));
+ rti_map = (Index *) palloc0(nparts * sizeof(Index));
present_parts = NULL;
i = -1;
@@ -673,6 +707,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
subplan_map[i] = subplanidx = relid_subplan_map[partrel->relid] - 1;
subpart_map[i] = subpartidx = relid_subpart_map[partrel->relid] - 1;
relid_map[i] = planner_rt_fetch(partrel->relid, root)->relid;
+ rti_map[i] = partrel->relid;
if (subplanidx >= 0)
{
present_parts = bms_add_member(present_parts, i);
@@ -697,6 +732,7 @@ make_partitionedrel_pruneinfo(PlannerInfo *root, RelOptInfo *parentrel,
pinfo->subplan_map = subplan_map;
pinfo->subpart_map = subpart_map;
pinfo->relid_map = relid_map;
+ pinfo->rti_map = rti_map;
}
pfree(relid_subpart_map);
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index cc943205d3..339bb603f7 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1747,7 +1747,8 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- ListCell *lc2;
+ Bitmapset *allLockRelids;
+ int rti;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -1760,14 +1761,17 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
*/
Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+ Assert(plannedstmt->minLockRelids == NULL);
if (query)
ScanQueryForLocks(query, acquire);
continue;
}
- foreach(lc2, plannedstmt->rtable)
+ allLockRelids = plannedstmt->minLockRelids;
+ rti = -1;
+ while ((rti = bms_next_member(allLockRelids, rti)) > 0)
{
- RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
+ RangeTblEntry *rte = rt_fetch(rti, plannedstmt->rtable);
if (rte->rtekind != RTE_RELATION)
continue;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 17fabc18c9..aeeaeb7884 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -45,6 +45,7 @@ extern void ExecCleanupTupleRouting(ModifyTableState *mtstate,
* nparts Length of subplan_map[] and subpart_map[].
* subplan_map Subplan index by partition index, or -1.
* subpart_map Subpart index by partition index, or -1.
+ * rti_map Range table index by partition index, or 0.
* present_parts A Bitmapset of the partition indexes that we
* have subplans or subparts for.
* initial_pruning_steps List of PartitionPruneSteps used to
@@ -61,6 +62,7 @@ typedef struct PartitionedRelPruningData
int nparts;
int *subplan_map;
int *subpart_map;
+ Index *rti_map;
Bitmapset *present_parts;
List *initial_pruning_steps;
List *exec_pruning_steps;
diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index 1f33902947..c2f2544df5 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -218,6 +218,7 @@ extern struct Bitmapset *readBitmapset(void);
extern uintptr_t readDatum(bool typbyval);
extern bool *readBoolCols(int numCols);
extern int *readIntCols(int numCols);
+extern Index *readIndexCols(int numCols);
extern Oid *readOidCols(int numCols);
extern int16 *readAttrNumberCols(int numCols);
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 654dba61aa..4337e7aa34 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -128,6 +128,17 @@ typedef struct PlannerGlobal
/* List of PartitionPruneInfo contained in the plan */
List *partPruneInfos;
+ /*
+ * Do any of those PartitionPruneInfos have initial pruning steps in them?
+ */
+ bool containsInitialPruning;
+
+ /*
+ * Indexes of all range table entries; for AcquireExecutorLocks()'s
+ * perusal.
+ */
+ Bitmapset *minLockRelids;
+
/* OIDs of relations the plan depends on */
List *relationOids;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index bddfe86191..eb0a007946 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -73,11 +73,18 @@ typedef struct PlannedStmt
List *partPruneInfos; /* List of PartitionPruneInfo contained in the
* plan */
+ bool containsInitialPruning; /* Do any of those PartitionPruneInfos
+ * have initial pruning steps in them?
+ */
+
List *rtable; /* list of RangeTblEntry nodes */
List *permInfos; /* list of RTEPermissionInfo nodes for rtable
* entries needing one */
+ Bitmapset *minLockRelids; /* Indexes of all range table entries; for
+ * AcquireExecutorLocks()'s perusal */
+
/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */
List *resultRelations; /* integer list of RT indexes, or NIL */
@@ -1417,6 +1424,13 @@ typedef struct PlanRowMark
* prune_infos List of Lists containing PartitionedRelPruneInfo nodes,
* one sublist per run-time-prunable partition hierarchy
* appearing in the parent plan node's subplans.
+ *
+ * needs_init_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its initial_pruning_steps set?
+ *
+ * needs_exec_pruning Does any of the PartitionedRelPruneInfos in
+ * prune_infos have its exec_pruning_steps set?
+ *
* other_subplans Indexes of any subplans that are not accounted for
* by any of the PartitionedRelPruneInfo nodes in
* "prune_infos". These subplans must not be pruned.
@@ -1428,6 +1442,8 @@ typedef struct PartitionPruneInfo
NodeTag type;
Bitmapset *root_parent_relids;
List *prune_infos;
+ bool needs_init_pruning;
+ bool needs_exec_pruning;
Bitmapset *other_subplans;
} PartitionPruneInfo;
@@ -1472,6 +1488,9 @@ typedef struct PartitionedRelPruneInfo
/* relation OID by partition index, or 0 */
Oid *relid_map pg_node_attr(array_size(nparts));
+ /* Range table index by partition index, or 0. */
+ Index *rti_map pg_node_attr(array_size(nparts));
+
/*
* initial_pruning_steps shows how to prune during executor startup (i.e.,
* without use of any PARAM_EXEC Params); it is NIL if no startup pruning
--
2.35.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-21 10:18 Alvaro Herrera <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 2 replies; 82+ messages in thread
From: Alvaro Herrera @ 2022-12-21 10:18 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
This version of the patch looks not entirely unreasonable to me. I'll
set this as Ready for Committer in case David or Tom or someone else
want to have a look and potentially commit it.
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-21 10:47 Amit Langote <[email protected]>
parent: Alvaro Herrera <[email protected]>
1 sibling, 0 replies; 82+ messages in thread
From: Amit Langote @ 2022-12-21 10:47 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; Tom Lane <[email protected]>; pgsql-hackers
On Wed, Dec 21, 2022 at 7:18 PM Alvaro Herrera <[email protected]> wrote:
> This version of the patch looks not entirely unreasonable to me. I'll
> set this as Ready for Committer in case David or Tom or someone else
> want to have a look and potentially commit it.
Thank you, Alvaro.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2022-12-21 15:18 Tom Lane <[email protected]>
parent: Alvaro Herrera <[email protected]>
1 sibling, 0 replies; 82+ messages in thread
From: Tom Lane @ 2022-12-21 15:18 UTC (permalink / raw)
To: Alvaro Herrera <[email protected]>; +Cc: Amit Langote <[email protected]>; Robert Haas <[email protected]>; Jacob Champion <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
Alvaro Herrera <[email protected]> writes:
> This version of the patch looks not entirely unreasonable to me. I'll
> set this as Ready for Committer in case David or Tom or someone else
> want to have a look and potentially commit it.
I will have a look during the January CF.
regards, tom lane
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-05-20 03:06 Tom Lane <[email protected]>
0 siblings, 2 replies; 82+ messages in thread
From: Tom Lane @ 2025-05-20 03:06 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Amit Langote <[email protected]> writes:
> Pushed after some tweaks to comments and the test case.
My attention was drawn to commit 525392d57 after observing that
Valgrind complained about a memory leak in some code that commit added
to BuildCachedPlan(). I tried to make sense of said code so I could
remove the leak, and eventually arrived at the attached patch, which
is part of a series of leak-fixing things hence the high sequence
number.
Unfortunately, the bad things I speculated about in the added comments
seem to be reality. The second attached file is a test case that
triggers
TRAP: failed Assert("list_length(plan_list) == list_length(plan->stmt_list)"), File: "plancache.c", Line: 1259, PID: 602087
because it adds a DO ALSO rule that causes the rewriter to generate
more PlannedStmts than it did before.
This is quite awful, because it does more than simply break the klugy
(and undocumented) business about keeping the top-level List in a
different context. What it means is that any outside code that is
busy iterating that List is very fundamentally broken: it's not clear
what List index it ought to resume at, except that "the one it was at"
is demonstrably incorrect.
I also don't really believe the (also undocumented) assumption that
such outside code is in between executions of PlannedStmts of the
List and hence can tolerate those being ripped out and replaced.
I have not attempted to build an example, because the one I have
seems sufficiently damning. But I bet that a recursive function
could be constructed in such a way that an outer execution is
still in progress when an inner call triggers UpdateCachedPlan.
Another small problem (much more easily fixable than the above,
probably) is that summarily setting "plan->is_valid = true"
at the end is not okay. We could already have received an
invalidation that should result in marking the plan stale.
(Holding locks on the tables involved is not sufficient to
prevent that, as there are other sources of inval events.)
It's possible that this code can be fixed, but I fear it's
going to involve some really fundamental redesign, which
probably shouldn't be happening after beta1. I think there
is no alternative but to revert for v18.
regards, tom lane
drop table if exists test_table;
CREATE TABLE test_table (a int);
create or replace function doit(r int, a int) returns bool
language plpgsql as $$
begin
raise notice 'r = %, a = %', r, a;
if (r = 10) then
CREATE RULE make_noise AS ON DELETE TO test_table
DO ALSO INSERT INTO test_table SELECT 2;
raise notice 'made rule';
end if;
if (r = 20 and a = 1) then
CREATE RULE make_noise_2 AS ON DELETE TO test_table
DO ALSO INSERT INTO test_table SELECT 3;
raise notice 'made rule 2';
end if;
return true;
end$$;
set plan_cache_mode to force_generic_plan;
DO $$
BEGIN
FOR r IN 1..30 LOOP
TRUNCATE test_table;
INSERT INTO test_table SELECT 1;
DELETE FROM test_table where doit(r,a);
END LOOP;
END$$;
table test_table;
Attachments:
[text/x-diff] v2-0010-Partially-fix-some-extremely-broken-code-from-52.patch (3.7K, 2-v2-0010-Partially-fix-some-extremely-broken-code-from-52.patch)
download | inline diff:
From a680e6b6885378beb0164e465b50afd81558ebc5 Mon Sep 17 00:00:00 2001
From: Tom Lane <[email protected]>
Date: Mon, 19 May 2025 00:02:20 -0400
Subject: [PATCH v2 10/20] Partially fix some extremely broken code from
525392d57.
Avoid leaking memory in the stmt_context during BuildCachedPlan.
Sadly, this code has problems a lot worse than that (per the
documentation I added), so I suspect 525392d57 will get reverted
and we won't need this patch.
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
src/backend/utils/cache/plancache.c | 37 ++++++++++++++++++++++-------
1 file changed, 29 insertions(+), 8 deletions(-)
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 9bcbc4c3e97..40ba3e9df7c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1109,22 +1109,32 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
*/
if (!plansource->is_oneshot)
{
+ List *stmt_plist;
+
plan_context = AllocSetContextCreate(CurrentMemoryContext,
"CachedPlan",
ALLOCSET_START_SMALL_SIZES);
MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
- stmt_context = AllocSetContextCreate(CurrentMemoryContext,
+ stmt_context = AllocSetContextCreate(plan_context,
"CachedPlan PlannedStmts",
ALLOCSET_START_SMALL_SIZES);
MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string);
- MemoryContextSetParent(stmt_context, plan_context);
+ /*
+ * Copy plans into the stmt_context.
+ */
MemoryContextSwitchTo(stmt_context);
- plist = copyObject(plist);
+ stmt_plist = copyObject(plist);
+ /*
+ * We actually need the top-level List object to be in the long-lived
+ * plan_context, in case UpdateCachedPlan wants to update it; see
+ * comments therein. Do a shallow copy to make that happen.
+ */
MemoryContextSwitchTo(plan_context);
- plist = list_copy(plist);
+ plist = list_copy(stmt_plist);
+ list_free(stmt_plist); /* be tidy */
}
else
plan_context = CurrentMemoryContext;
@@ -1251,12 +1261,22 @@ UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
/*
* Planning work is done in the caller's memory context. The resulting
- * PlannedStmt is then copied into plan->stmt_context after throwing away
- * the old ones.
+ * PlannedStmt(s) are then copied into plan->stmt_context after throwing
+ * away the old ones. But note that we re-use the long-lived
+ * plan->stmt_list list to hold the pointers to the PlannedStmts. This
+ * kluge avoids breaking code that is iterating over that list, so long as
+ * it's between statements and not currently using one of the contained
+ * PlannedStmts.
+ *
+ * XXX this is, if not actively broken, at least unbelievably fragile.
+ * Aside from the likelihood that the just-stated assumption doesn't hold
+ * universally, there is not a good reason to believe that the length of
+ * the plan list is constant.
*/
plan_list = pg_plan_queries(query_list, plansource->query_string,
plansource->cursor_options, NULL);
- Assert(list_length(plan_list) == list_length(plan->stmt_list));
+ if (list_length(plan_list) != list_length(plan->stmt_list))
+ elog(ERROR, "UpdateCachedPlan(): plan list length changed");
MemoryContextReset(plan->stmt_context);
oldcxt = MemoryContextSwitchTo(plan->stmt_context);
@@ -1276,7 +1296,8 @@ UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
/*
* We've updated all the plans that might have been invalidated, so mark
- * the CachedPlan as valid.
+ * the CachedPlan as valid. XXX wrong: we could already have hit a new
+ * invalidation event.
*/
plan->is_valid = true;
--
2.43.5
[text/plain] break_cached_plan.sql (778B, 3-break_cached_plan.sql)
download | inline:
drop table if exists test_table;
CREATE TABLE test_table (a int);
create or replace function doit(r int, a int) returns bool
language plpgsql as $$
begin
raise notice 'r = %, a = %', r, a;
if (r = 10) then
CREATE RULE make_noise AS ON DELETE TO test_table
DO ALSO INSERT INTO test_table SELECT 2;
raise notice 'made rule';
end if;
if (r = 20 and a = 1) then
CREATE RULE make_noise_2 AS ON DELETE TO test_table
DO ALSO INSERT INTO test_table SELECT 3;
raise notice 'made rule 2';
end if;
return true;
end$$;
set plan_cache_mode to force_generic_plan;
DO $$
BEGIN
FOR r IN 1..30 LOOP
TRUNCATE test_table;
INSERT INTO test_table SELECT 1;
DELETE FROM test_table where doit(r,a);
END LOOP;
END$$;
table test_table;
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-05-20 07:59 Tomas Vondra <[email protected]>
parent: Tom Lane <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Tomas Vondra @ 2025-05-20 07:59 UTC (permalink / raw)
To: Tom Lane <[email protected]>; Amit Langote <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On 5/20/25 05:06, Tom Lane wrote:
> Amit Langote <[email protected]> writes:
>> Pushed after some tweaks to comments and the test case.
>
> My attention was drawn to commit 525392d57 after observing that
> Valgrind complained about a memory leak in some code that commit added
> to BuildCachedPlan(). I tried to make sense of said code so I could
> remove the leak, and eventually arrived at the attached patch, which
> is part of a series of leak-fixing things hence the high sequence
> number.
>
> Unfortunately, the bad things I speculated about in the added comments
> seem to be reality. The second attached file is a test case that
> triggers
>
> ...
FYI I added this as a PG18 open item:
https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-05-20 13:25 Amit Langote <[email protected]>
parent: Tom Lane <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Amit Langote @ 2025-05-20 13:25 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Hi Tom,
On Tue, May 20, 2025 at 12:06 PM Tom Lane <[email protected]> wrote:
> My attention was drawn to commit 525392d57 after observing that
> Valgrind complained about a memory leak in some code that commit added
> to BuildCachedPlan(). I tried to make sense of said code so I could
> remove the leak, and eventually arrived at the attached patch, which
> is part of a series of leak-fixing things hence the high sequence
> number.
>
> Unfortunately, the bad things I speculated about in the added comments
> seem to be reality. The second attached file is a test case that
> triggers
>
> TRAP: failed Assert("list_length(plan_list) == list_length(plan->stmt_list)"), File: "plancache.c", Line: 1259, PID: 602087
>
> because it adds a DO ALSO rule that causes the rewriter to generate
> more PlannedStmts than it did before.
>
> This is quite awful, because it does more than simply break the klugy
> (and undocumented) business about keeping the top-level List in a
> different context. What it means is that any outside code that is
> busy iterating that List is very fundamentally broken: it's not clear
> what List index it ought to resume at, except that "the one it was at"
> is demonstrably incorrect.
>
> I also don't really believe the (also undocumented) assumption that
> such outside code is in between executions of PlannedStmts of the
> List and hence can tolerate those being ripped out and replaced.
> I have not attempted to build an example, because the one I have
> seems sufficiently damning. But I bet that a recursive function
> could be constructed in such a way that an outer execution is
> still in progress when an inner call triggers UpdateCachedPlan.
>
> Another small problem (much more easily fixable than the above,
> probably) is that summarily setting "plan->is_valid = true"
> at the end is not okay. We could already have received an
> invalidation that should result in marking the plan stale.
> (Holding locks on the tables involved is not sufficient to
> prevent that, as there are other sources of inval events.)
Thanks for pointing out the hole in the current handling of
CachedPlan->stmt_list. You're right that the approach of preserving
the list structure while replacing its contents in-place doesn’t hold
up when the rewriter adds or removes statements dynamically. There
might be other cases that neither of us have tried. I don’t think
that mechanism is salvageable.
To address the issue without needing a full revert, I’m considering
dropping UpdateCachedPlan() and removing the associated MemoryContext
dance to preserve CachedPlan->stmt_list structure. Instead, the
executor would replan the necessary query into a transient list of
PlannedStmts, leaving the original CachedPlan untouched. That avoids
mutating shared plan state during execution and still enables deferred
locking in the vast majority of cases.
There are two variants of this approach. In the simpler form, the
transient PlannedStmt list exists only in executor-local memory and
isn’t registered with the invalidation machinery. That might be
acceptable in practice, since all referenced relations are locked at
that point -- but it would mean any invalidation events delivered
during execution are ignored. The more robust variant is to build a
one-query standalone CachedPlan using something like
GetTransientCachedPlanForQuery(), which I had proposed back in [1].
This gets added to a standalone_plan_list so that invalidation
callbacks can still reach it. I dropped that design earlier [2] due to
the cleanup overhead, but I’d be happy to bring it back in a
simplified form if that seems preferable.
One open question in either case is what to do if the number of
PlannedStmts in the rewritten plan changes as with your example. Would
it be reasonable to just go ahead and execute the additional
statements from the transient plan, even though the original
CachedPlan wouldn’t have known about them until the next use? That
would avoid introducing any new failure behavior while still handling
the invalidation correctly for the current execution.
> It's possible that this code can be fixed, but I fear it's
> going to involve some really fundamental redesign, which
> probably shouldn't be happening after beta1. I think there
> is no alternative but to revert for v18.
...Beyond that, I think I’ve run out of clean options for making
deferred locking executor-local while keeping invalidation safe. I
know you'd previously objected (with good reason) to making
GetCachedPlan() itself run pruning logic to determine which partitions
to lock -- and to the idea of carrying or sharing the result of that
pruning back to the executor via interface changes in the path from
plancache.c through its callers down to ExecutorStart(). So I’ve
steered away from revisiting that direction. But if we’re not
comfortable with either of the transient replanning options, then we
may end up shelving the deferred locking idea entirely -- which would
be unfortunate, given how much it helps workloads that rely on generic
plans over large partitioned tables.
Let me know what you think -- I’ll hold off on posting a revert or a
replacement until we’ve agreed on the path forward.
--
Thanks, Amit Langote
[1] https://www.postgresql.org/message-id/CA%2BHiwqGSOge3eT3kcm_nxCSA3Ut%2Bd0jtchi8g8J9uXi-kyC7Jw%40mail...
[2] https://www.postgresql.org/message-id/CA%2BHiwqHRRFQN6yZ54fBydOTM6ncqZBCmewZ6n519RjRdDsO44g%40mail.g...
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-05-20 15:38 Tom Lane <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 2 replies; 82+ messages in thread
From: Tom Lane @ 2025-05-20 15:38 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Amit Langote <[email protected]> writes:
> Thanks for pointing out the hole in the current handling of
> CachedPlan->stmt_list. You're right that the approach of preserving
> the list structure while replacing its contents in-place doesn’t hold
> up when the rewriter adds or removes statements dynamically. There
> might be other cases that neither of us have tried. I don’t think
> that mechanism is salvageable.
> To address the issue without needing a full revert, I’m considering
> dropping UpdateCachedPlan() and removing the associated MemoryContext
> dance to preserve CachedPlan->stmt_list structure. Instead, the
> executor would replan the necessary query into a transient list of
> PlannedStmts, leaving the original CachedPlan untouched. That avoids
> mutating shared plan state during execution and still enables deferred
> locking in the vast majority of cases.
Yeah, I think messing with the CachedPlan is just fundamentally wrong.
It breaks the invariant that the executor should not scribble on what
it's handed --- maybe not as obviously as some other cases, but it's
still not a good design.
I kind of feel that we ought to take two steps back and think
about what it even means to have a generic plan in this situation.
Perhaps we should simply refuse to use that code path if there are
prunable partitioned tables involved?
> Let me know what you think -- I’ll hold off on posting a revert or a
> replacement until we’ve agreed on the path forward.
I had not looked at 525392d57 in any detail before (the claim in
the commit message that I reviewed it is a figment of someone's
imagination). Now that I have, I'm still going to argue for revert.
Aside from the points above, I really hate what's been done to the
fundamental executor APIs. The fact that ExecutorStart callers have
to know about this is as ugly as can be. I also don't like the
fact that it's added overhead in cases where there can be no benefit
(notice that my test case doesn't even involve a partitioned table).
I still like the core idea of deferring locking, but I don't like
anything about this implementation of it. It seems like there has
to be a better and simpler way.
regards, tom lane
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-05-21 10:22 Amit Langote <[email protected]>
parent: Tom Lane <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Amit Langote @ 2025-05-21 10:22 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Wed, May 21, 2025 at 12:38 AM Tom Lane <[email protected]> wrote:
> Amit Langote <[email protected]> writes:
> > Thanks for pointing out the hole in the current handling of
> > CachedPlan->stmt_list. You're right that the approach of preserving
> > the list structure while replacing its contents in-place doesn’t hold
> > up when the rewriter adds or removes statements dynamically. There
> > might be other cases that neither of us have tried. I don’t think
> > that mechanism is salvageable.
>
> > To address the issue without needing a full revert, I’m considering
> > dropping UpdateCachedPlan() and removing the associated MemoryContext
> > dance to preserve CachedPlan->stmt_list structure. Instead, the
> > executor would replan the necessary query into a transient list of
> > PlannedStmts, leaving the original CachedPlan untouched. That avoids
> > mutating shared plan state during execution and still enables deferred
> > locking in the vast majority of cases.
>
> Yeah, I think messing with the CachedPlan is just fundamentally wrong.
> It breaks the invariant that the executor should not scribble on what
> it's handed --- maybe not as obviously as some other cases, but it's
> still not a good design.
Fair enough. I’ll revert this and some related changes shortly. WIP
patch attached.
> I kind of feel that we ought to take two steps back and think
> about what it even means to have a generic plan in this situation.
> Perhaps we should simply refuse to use that code path if there are
> prunable partitioned tables involved?
Sorry, I’m not sure I fully understand -- especially what you mean by
“that code path.” If you're referring to the generic plan creation and
reuse path in general, I'd point out that initial runtime pruning was
introduced largely to improve the efficiency of generic plan execution
(albeit without addressing the locking bottleneck at the time -- David
Rowley had explored that earlier). So simply disallowing generic plans
when partitions are involved feels like an odd direction, given that a
major motivation for initial pruning was to make those cases faster.
Custom plans can win when parameters are available, of course, but
there's a major use case involving stable expressions like now() with
time-based partitions, where plan_cache_mode = auto will still choose
a generic plan. So I wouldn’t say that optimizing generic plan
execution -- especially the goal of this project -- is wasted effort
in practice.
> > Let me know what you think -- I’ll hold off on posting a revert or a
> > replacement until we’ve agreed on the path forward.
>
> I had not looked at 525392d57 in any detail before (the claim in
> the commit message that I reviewed it is a figment of someone's
> imagination).
Apologies if I gave the misleading impression that you were on board
with the current design. I meant only to acknowledge your earlier
engagement with the general idea, which I appreciated. I marked it as
“(old versions)” in the commit metadata to reflect that -- clearly I
should’ve been more precise. I know that the meaning of Reviewed-by
and other tags is evolving and I clearly haven't kept up.
> Now that I have, I'm still going to argue for revert.
> Aside from the points above, I really hate what's been done to the
> fundamental executor APIs. The fact that ExecutorStart callers have
> to know about this is as ugly as can be. I also don't like the
> fact that it's added overhead in cases where there can be no benefit
> (notice that my test case doesn't even involve a partitioned table).
I tried to keep the overhead low by ensuring that the only additional
thing we'd be doing in the regular path is a CachedPlan->is_valid
boolean check in a couple of places, and that further work would only
happen if invalidation actually occurred. That said, I realize the
patch makes invalidation handling apply in more cases than before,
which may itself be seen as added overhead. But I may have
misunderstood your concern -- perhaps it's more about the layering
violation than the raw cycles?
> I still like the core idea of deferring locking, but I don't like
> anything about this implementation of it. It seems like there has
> to be a better and simpler way.
It's good to hear that you still like the core idea -- I’d really
appreciate it if you're willing to continue bearing with me as I try
to rework this in a way that's cleaner and better aligned with the
overall design. I'd welcome any thoughts you have along the way. I
know this has been a difficult project, and I don't mean to come
across as taking any of it lightly. I'm still hopeful there's a path
forward, but I completely understand the need to reset here.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v1-0001-Revert-Don-t-lock-partitions-pruned-by-initial-pr.patch (66.5K, 2-v1-0001-Revert-Don-t-lock-partitions-pruned-by-initial-pr.patch)
download | inline diff:
From 260d3fbf4801402f1a2ffd947f1f05fd3cad6878 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 21 May 2025 18:46:52 +0900
Subject: [PATCH v1] Revert "Don't lock partitions pruned by initial pruning"
As pointed out by Tom Lane, the patch introduced fragile and invasive
design around plan invalidation handling when locking of prunable
partitions was deferred from plancache.c to the executor. In
particular, it violated assumptions about CachedPlan immutability and
altered executor APIs in ways that are difficult to justify given the
added complexity and overhead.
This also removes the firstResultRels field added to PlannedStmt in
commit 28317de72, which was intended to support deferred locking of
certain ModifyTable result relations.
Reported-by: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]
---
contrib/auto_explain/auto_explain.c | 16 +-
.../pg_stat_statements/pg_stat_statements.c | 16 +-
src/backend/commands/copyto.c | 5 +-
src/backend/commands/createas.c | 5 +-
src/backend/commands/explain.c | 22 +-
src/backend/commands/extension.c | 4 +-
src/backend/commands/matview.c | 5 +-
src/backend/commands/portalcmds.c | 1 -
src/backend/commands/prepare.c | 9 +-
src/backend/commands/trigger.c | 15 --
src/backend/executor/README | 35 +---
src/backend/executor/execMain.c | 127 +----------
src/backend/executor/execParallel.c | 12 +-
src/backend/executor/execPartition.c | 67 +-----
src/backend/executor/execUtils.c | 1 -
src/backend/executor/functions.c | 4 +-
src/backend/executor/spi.c | 29 +--
src/backend/optimizer/plan/planner.c | 2 -
src/backend/optimizer/plan/setrefs.c | 3 -
src/backend/tcop/postgres.c | 4 +-
src/backend/tcop/pquery.c | 51 +----
src/backend/utils/cache/plancache.c | 197 +++---------------
src/backend/utils/mmgr/portalmem.c | 4 +-
src/include/commands/explain.h | 6 +-
src/include/commands/trigger.h | 1 -
src/include/executor/execdesc.h | 2 -
src/include/executor/executor.h | 33 +--
src/include/nodes/execnodes.h | 3 -
src/include/nodes/pathnodes.h | 3 -
src/include/nodes/plannodes.h | 7 -
src/include/utils/plancache.h | 46 +---
src/include/utils/portal.h | 4 +-
32 files changed, 88 insertions(+), 651 deletions(-)
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index cd6625020a7..1f4badb4928 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -81,7 +81,7 @@ static ExecutorRun_hook_type prev_ExecutorRun = NULL;
static ExecutorFinish_hook_type prev_ExecutorFinish = NULL;
static ExecutorEnd_hook_type prev_ExecutorEnd = NULL;
-static bool explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static void explain_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void explain_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count);
@@ -261,11 +261,9 @@ _PG_init(void)
/*
* ExecutorStart hook: start up logging if needed
*/
-static bool
+static void
explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
- bool plan_valid;
-
/*
* At the beginning of each top-level statement, decide whether we'll
* sample this statement. If nested-statement explaining is enabled,
@@ -301,13 +299,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
}
if (prev_ExecutorStart)
- plan_valid = prev_ExecutorStart(queryDesc, eflags);
+ prev_ExecutorStart(queryDesc, eflags);
else
- plan_valid = standard_ExecutorStart(queryDesc, eflags);
-
- /* The plan may have become invalid during standard_ExecutorStart() */
- if (!plan_valid)
- return false;
+ standard_ExecutorStart(queryDesc, eflags);
if (auto_explain_enabled())
{
@@ -325,8 +319,6 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
-
- return true;
}
/*
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9778407cba3..d8fdf42df79 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -335,7 +335,7 @@ static PlannedStmt *pgss_planner(Query *parse,
const char *query_string,
int cursorOptions,
ParamListInfo boundParams);
-static bool pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
+static void pgss_ExecutorStart(QueryDesc *queryDesc, int eflags);
static void pgss_ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction,
uint64 count);
@@ -989,19 +989,13 @@ pgss_planner(Query *parse,
/*
* ExecutorStart hook: start up tracking if needed
*/
-static bool
+static void
pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
- bool plan_valid;
-
if (prev_ExecutorStart)
- plan_valid = prev_ExecutorStart(queryDesc, eflags);
+ prev_ExecutorStart(queryDesc, eflags);
else
- plan_valid = standard_ExecutorStart(queryDesc, eflags);
-
- /* The plan may have become invalid during standard_ExecutorStart() */
- if (!plan_valid)
- return false;
+ standard_ExecutorStart(queryDesc, eflags);
/*
* If query has queryId zero, don't track it. This prevents double
@@ -1024,8 +1018,6 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcxt);
}
}
-
- return true;
}
/*
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index f87e405351d..ea6f18f2c80 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -835,7 +835,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
@@ -845,8 +845,7 @@ BeginCopyTo(ParseState *pstate,
*
* ExecutorStart computes a result tupdesc for us
*/
- if (!ExecutorStart(cstate->queryDesc, 0))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
+ ExecutorStart(cstate->queryDesc, 0);
tupDesc = cstate->queryDesc->tupDesc;
}
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 0a4155773eb..dfd2ab8e862 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -334,13 +334,12 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/* call ExecutorStart to prepare the plan for execution */
- if (!ExecutorStart(queryDesc, GetIntoRelEFlags(into)))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
+ ExecutorStart(queryDesc, GetIntoRelEFlags(into));
/* run the plan to completion */
ExecutorRun(queryDesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 786ee865f14..09ea30dfb92 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -369,8 +369,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, NULL, NULL, -1, into, es, queryString, params,
- queryEnv,
+ ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,9 +491,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
- CachedPlanSource *plansource, int query_index,
- IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -550,7 +547,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, cplan, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
@@ -564,17 +561,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
if (into)
eflags |= GetIntoRelEFlags(into);
- /* Prepare the plan for execution. */
- if (queryDesc->cplan)
- {
- ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
- Assert(queryDesc->planstate);
- }
- else
- {
- if (!ExecutorStart(queryDesc, eflags))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
- }
+ /* call ExecutorStart to prepare the plan for execution */
+ ExecutorStart(queryDesc, eflags);
/* Execute the plan for statistics if asked for */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 73c52e970f6..e6f9ab6dfd6 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -993,13 +993,11 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
- NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
- if (!ExecutorStart(qdesc, 0))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
+ ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0);
ExecutorFinish(qdesc);
ExecutorEnd(qdesc);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index e7854add178..27c2cb26ef5 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -438,13 +438,12 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, NULL, queryString,
+ queryDesc = CreateQueryDesc(plan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
/* call ExecutorStart to prepare the plan for execution */
- if (!ExecutorStart(queryDesc, 0))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
+ ExecutorStart(queryDesc, 0);
/* run the plan */
ExecutorRun(queryDesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 4c2ac045224..e7c8171c102 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -117,7 +117,6 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
- NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index bf7d2b2309f..34b6410d6a2 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,8 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- cplan,
- entry->plansource);
+ cplan);
/*
* For CREATE TABLE ... AS EXECUTE, we must verify that the prepared
@@ -586,7 +585,6 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
- int query_index = 0;
if (es->memory)
{
@@ -659,8 +657,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, cplan, entry->plansource, query_index,
- into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
@@ -671,8 +668,6 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
/* Separate plans with an appropriate separator */
if (lnext(plan_list, p) != NULL)
ExplainSeparatePlans(es);
-
- query_index++;
}
if (estate)
diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index c9f61130c69..67f8e70f9c1 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -5057,21 +5057,6 @@ AfterTriggerBeginQuery(void)
}
-/* ----------
- * AfterTriggerAbortQuery()
- *
- * Called by standard_ExecutorEnd() if the query execution was aborted due to
- * the plan becoming invalid during initialization.
- * ----------
- */
-void
-AfterTriggerAbortQuery(void)
-{
- /* Revert the actions of AfterTriggerBeginQuery(). */
- afterTriggers.query_depth--;
-}
-
-
/* ----------
* AfterTriggerEndQuery()
*
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 02745c23ed9..54f4782f31b 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -285,28 +285,6 @@ are typically reset to empty once per tuple. Per-tuple contexts are usually
associated with ExprContexts, and commonly each PlanState node has its own
ExprContext to evaluate its qual and targetlist expressions in.
-Relation Locking
-----------------
-
-When the executor initializes a plan tree for execution, it doesn't lock
-non-index relations if the plan tree is freshly generated and not derived
-from a CachedPlan. This is because such locks have already been established
-during the query's parsing, rewriting, and planning phases. However, with a
-cached plan tree, some relations may remain unlocked. The function
-AcquireExecutorLocks() only locks unprunable relations in the plan, deferring
-the locking of prunable ones to executor initialization. This avoids
-unnecessary locking of relations that will be pruned during "initial" runtime
-pruning in ExecDoInitialPruning().
-
-This approach creates a window where a cached plan tree with child tables
-could become outdated if another backend modifies these tables before
-ExecDoInitialPruning() locks them. As a result, the executor has the added duty
-to verify the plan tree's validity whenever it locks a child table after
-doing initial pruning. This validation is done by checking the CachedPlan.is_valid
-flag. If the plan tree is outdated (is_valid = false), the executor stops
-further initialization, cleans up anything in EState that would have been
-allocated up to that point, and retries execution after recreating the
-invalid plan in the CachedPlan. See ExecutorStartCachedPlan().
Query Processing Control Flow
-----------------------------
@@ -315,13 +293,11 @@ This is a sketch of control flow for full query processing:
CreateQueryDesc
- ExecutorStart or ExecutorStartCachedPlan
+ ExecutorStart
CreateExecutorState
creates per-query context
- switch to per-query context to run ExecDoInitialPruning and ExecInitNode
+ switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
- ExecDoInitialPruning
- does initial pruning and locks surviving partitions if needed
ExecInitNode --- recursively scans plan tree
ExecInitNode
recurse into subsidiary nodes
@@ -345,12 +321,7 @@ This is a sketch of control flow for full query processing:
FreeQueryDesc
-As mentioned in the "Relation Locking" section, if the plan tree is found to
-be stale after locking partitions in ExecDoInitialPruning(), the control is
-immediately returned to ExecutorStartCachedPlan(), which will create a new plan
-tree and perform the steps starting from CreateExecutorState() again.
-
-Per above comments, it's not really critical for ExecEndPlan to free any
+Per above comments, it's not really critical for ExecEndNode to free any
memory; it'll all go away in FreeExecutorState anyway. However, we do need to
be careful to close relations, drop buffer pins, etc, so we do need to scan
the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 7230f968101..0391798dd2c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -55,13 +55,11 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
-#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
-#include "utils/plancache.h"
#include "utils/rls.h"
#include "utils/snapmgr.h"
@@ -119,16 +117,11 @@ static void ReportNotNullViolationError(ResultRelInfo *resultRelInfo,
* get control when ExecutorStart is called. Such a plugin would
* normally call standard_ExecutorStart().
*
- * Return value indicates if the plan has been initialized successfully so
- * that queryDesc->planstate contains a valid PlanState tree. It may not
- * if the plan got invalidated during InitPlan().
* ----------------------------------------------------------------
*/
-bool
+void
ExecutorStart(QueryDesc *queryDesc, int eflags)
{
- bool plan_valid;
-
/*
* In some cases (e.g. an EXECUTE statement or an execute message with the
* extended query protocol) the query_id won't be reported, so do it now.
@@ -140,14 +133,12 @@ ExecutorStart(QueryDesc *queryDesc, int eflags)
pgstat_report_query_id(queryDesc->plannedstmt->queryId, false);
if (ExecutorStart_hook)
- plan_valid = (*ExecutorStart_hook) (queryDesc, eflags);
+ (*ExecutorStart_hook) (queryDesc, eflags);
else
- plan_valid = standard_ExecutorStart(queryDesc, eflags);
-
- return plan_valid;
+ standard_ExecutorStart(queryDesc, eflags);
}
-bool
+void
standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
{
EState *estate;
@@ -271,64 +262,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
InitPlan(queryDesc, eflags);
MemoryContextSwitchTo(oldcontext);
-
- return ExecPlanStillValid(queryDesc->estate);
-}
-
-/*
- * ExecutorStartCachedPlan
- * Start execution for a given query in the CachedPlanSource, replanning
- * if the plan is invalidated due to deferred locks taken during the
- * plan's initialization
- *
- * This function handles cases where the CachedPlan given in queryDesc->cplan
- * might become invalid during the initialization of the plan given in
- * queryDesc->plannedstmt, particularly when prunable relations in it are
- * locked after performing initial pruning. If the locks invalidate the plan,
- * the function calls UpdateCachedPlan() to replan all queries in the
- * CachedPlan, and then retries initialization.
- *
- * The function repeats the process until ExecutorStart() successfully
- * initializes the plan, that is without the CachedPlan becoming invalid.
- */
-void
-ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
- CachedPlanSource *plansource,
- int query_index)
-{
- if (unlikely(queryDesc->cplan == NULL))
- elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlan");
- if (unlikely(plansource == NULL))
- elog(ERROR, "ExecutorStartCachedPlan(): missing CachedPlanSource");
-
- /*
- * Loop and retry with an updated plan until no further invalidation
- * occurs.
- */
- while (1)
- {
- if (!ExecutorStart(queryDesc, eflags))
- {
- /*
- * Clean up the current execution state before creating the new
- * plan to retry ExecutorStart(). Mark execution as aborted to
- * ensure that AFTER trigger state is properly reset.
- */
- queryDesc->estate->es_aborted = true;
- ExecutorEnd(queryDesc);
-
- /* Retry ExecutorStart() with an updated plan tree. */
- queryDesc->plannedstmt = UpdateCachedPlan(plansource, query_index,
- queryDesc->queryEnv);
- }
- else
-
- /*
- * Exit the loop if the plan is initialized successfully and no
- * sinval messages were received that invalidated the CachedPlan.
- */
- break;
- }
}
/* ----------------------------------------------------------------
@@ -387,7 +320,6 @@ standard_ExecutorRun(QueryDesc *queryDesc,
estate = queryDesc->estate;
Assert(estate != NULL);
- Assert(!estate->es_aborted);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/* caller must ensure the query's snapshot is active */
@@ -494,11 +426,8 @@ standard_ExecutorFinish(QueryDesc *queryDesc)
Assert(estate != NULL);
Assert(!(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
- /*
- * This should be run once and only once per Executor instance and never
- * if the execution was aborted.
- */
- Assert(!estate->es_finished && !estate->es_aborted);
+ /* This should be run once and only once per Executor instance */
+ Assert(!estate->es_finished);
/* Switch into per-query memory context */
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -561,10 +490,11 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
(PgStat_Counter) estate->es_parallel_workers_launched);
/*
- * Check that ExecutorFinish was called, unless in EXPLAIN-only mode or if
- * execution was aborted.
+ * Check that ExecutorFinish was called, unless in EXPLAIN-only mode. This
+ * Assert is needed because ExecutorFinish is new as of 9.1, and callers
+ * might forget to call it.
*/
- Assert(estate->es_finished || estate->es_aborted ||
+ Assert(estate->es_finished ||
(estate->es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY));
/*
@@ -578,14 +508,6 @@ standard_ExecutorEnd(QueryDesc *queryDesc)
UnregisterSnapshot(estate->es_snapshot);
UnregisterSnapshot(estate->es_crosscheck_snapshot);
- /*
- * Reset AFTER trigger module if the query execution was aborted.
- */
- if (estate->es_aborted &&
- !(estate->es_top_eflags &
- (EXEC_FLAG_SKIP_TRIGGERS | EXEC_FLAG_EXPLAIN_ONLY)))
- AfterTriggerAbortQuery();
-
/*
* Must switch out of context before destroying it
*/
@@ -684,21 +606,6 @@ ExecCheckPermissions(List *rangeTable, List *rteperminfos,
(rte->rtekind == RTE_SUBQUERY &&
rte->relkind == RELKIND_VIEW));
- /*
- * Ensure that we have at least an AccessShareLock on relations
- * whose permissions need to be checked.
- *
- * Skip this check in a parallel worker because locks won't be
- * taken until ExecInitNode() performs plan initialization.
- *
- * XXX: ExecCheckPermissions() in a parallel worker may be
- * redundant with the checks done in the leader process, so this
- * should be reviewed to ensure it’s necessary.
- */
- Assert(IsParallelWorker() ||
- CheckRelationOidLockedByMe(rte->relid, AccessShareLock,
- true));
-
(void) getRTEPermissionInfo(rteperminfos, rte);
/* Many-to-one mapping not allowed */
Assert(!bms_is_member(rte->perminfoindex, indexset));
@@ -924,12 +831,6 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
*
* Initializes the query plan: open files, allocate storage
* and start up the rule manager
- *
- * If the plan originates from a CachedPlan (given in queryDesc->cplan),
- * it can become invalid during runtime "initial" pruning when the
- * remaining set of locks is taken. The function returns early in that
- * case without initializing the plan, and the caller is expected to
- * retry with a new valid plan.
* ----------------------------------------------------------------
*/
static void
@@ -937,7 +838,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
{
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
- CachedPlan *cachedplan = queryDesc->cplan;
Plan *plan = plannedstmt->planTree;
List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
@@ -958,7 +858,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
bms_copy(plannedstmt->unprunableRelids));
estate->es_plannedstmt = plannedstmt;
- estate->es_cachedplan = cachedplan;
estate->es_part_prune_infos = plannedstmt->partPruneInfos;
/*
@@ -972,9 +871,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
*/
ExecDoInitialPruning(estate);
- if (!ExecPlanStillValid(estate))
- return;
-
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
*/
@@ -3092,9 +2988,6 @@ EvalPlanQualStart(EPQState *epqstate, Plan *planTree)
* the snapshot, rangetable, and external Param info. They need their own
* copies of local state, including a tuple table, es_param_exec_vals,
* result-rel info, etc.
- *
- * es_cachedplan is not copied because EPQ plan execution does not acquire
- * any new locks that could invalidate the CachedPlan.
*/
rcestate->es_direction = ForwardScanDirection;
rcestate->es_snapshot = parentestate->es_snapshot;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 39c990ae638..f3e77bda279 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1278,15 +1278,8 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
- /*
- * Create a QueryDesc for the query. We pass NULL for cachedplan, because
- * we don't have a pointer to the CachedPlan in the leader's process. It's
- * fine because the only reason the executor needs to see it is to decide
- * if it should take locks on certain relations, but parallel workers
- * always take locks anyway.
- */
+ /* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
- NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
@@ -1471,8 +1464,7 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
/* Start up the executor */
queryDesc->plannedstmt->jitFlags = fpes->jit_flags;
- if (!ExecutorStart(queryDesc, fpes->eflags))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
+ ExecutorStart(queryDesc, fpes->eflags);
/* Special executor initialization steps for parallel workers */
queryDesc->planstate->state->es_query_dsa = area;
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 3f8a4cb5244..3299db22bd5 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -26,7 +26,6 @@
#include "partitioning/partdesc.h"
#include "partitioning/partprune.h"
#include "rewrite/rewriteManip.h"
-#include "storage/lmgr.h"
#include "utils/acl.h"
#include "utils/lsyscache.h"
#include "utils/partcache.h"
@@ -1771,8 +1770,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * all plan nodes that contain a PartitionPruneInfo. This also locks the
- * leaf partitions whose subnodes will be initialized if needed.
+ * all plan nodes that contain a PartitionPruneInfo.
*
* ExecInitPartitionExecPruning:
* Updates the PartitionPruneState found at given part_prune_index in
@@ -1793,13 +1791,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
-
/*
* ExecDoInitialPruning
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
- * plan nodes that support partition pruning. This also locks the leaf
- * partitions whose subnodes will be initialized if needed.
+ * plan nodes that support partition pruning.
*
* This function iterates over each PartitionPruneInfo entry in
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
@@ -1821,9 +1817,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
void
ExecDoInitialPruning(EState *estate)
{
- PlannedStmt *stmt = estate->es_plannedstmt;
ListCell *lc;
- List *locked_relids = NIL;
foreach(lc, estate->es_part_prune_infos)
{
@@ -1849,68 +1843,11 @@ ExecDoInitialPruning(EState *estate)
else
validsubplan_rtis = all_leafpart_rtis;
- if (ExecShouldLockRelations(estate))
- {
- int rtindex = -1;
-
- while ((rtindex = bms_next_member(validsubplan_rtis,
- rtindex)) >= 0)
- {
- RangeTblEntry *rte = exec_rt_fetch(rtindex, estate);
-
- Assert(rte->rtekind == RTE_RELATION &&
- rte->rellockmode != NoLock);
- LockRelationOid(rte->relid, rte->rellockmode);
- locked_relids = lappend_int(locked_relids, rtindex);
- }
- }
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
estate->es_part_prune_results = lappend(estate->es_part_prune_results,
validsubplans);
}
-
- /*
- * Lock the first result relation of each ModifyTable node, even if it was
- * pruned. This is required for ExecInitModifyTable(), which keeps its
- * first result relation if all other result relations have been pruned,
- * because some executor paths (e.g., in nodeModifyTable.c and
- * execPartition.c) rely on there being at least one result relation.
- *
- * There's room for improvement here --- we actually only need to do this
- * if all other result relations of the ModifyTable node were pruned, but
- * we don't have an easy way to tell that here.
- */
- if (stmt->resultRelations && ExecShouldLockRelations(estate))
- {
- foreach(lc, stmt->firstResultRels)
- {
- Index firstResultRel = lfirst_int(lc);
-
- if (!bms_is_member(firstResultRel, estate->es_unpruned_relids))
- {
- RangeTblEntry *rte = exec_rt_fetch(firstResultRel, estate);
-
- Assert(rte->rtekind == RTE_RELATION && rte->rellockmode != NoLock);
- LockRelationOid(rte->relid, rte->rellockmode);
- locked_relids = lappend_int(locked_relids, firstResultRel);
- }
- }
- }
-
- /*
- * Release the useless locks if the plan won't be executed. This is the
- * same as what CheckCachedPlan() in plancache.c does.
- */
- if (!ExecPlanStillValid(estate))
- {
- foreach(lc, locked_relids)
- {
- RangeTblEntry *rte = exec_rt_fetch(lfirst_int(lc), estate);
-
- UnlockRelationOid(rte->relid, rte->rellockmode);
- }
- }
}
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 772c86e70e9..fdc65c2b42b 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -147,7 +147,6 @@ CreateExecutorState(void)
estate->es_top_eflags = 0;
estate->es_instrument = 0;
estate->es_finished = false;
- estate->es_aborted = false;
estate->es_exprcontexts = NIL;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 8d4d062d579..b1f9c17f98a 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1338,7 +1338,6 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
- NULL,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1363,8 +1362,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
eflags = EXEC_FLAG_SKIP_TRIGGERS;
else
eflags = 0; /* default run-to-completion flags */
- if (!ExecutorStart(es->qd, eflags))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
+ ExecutorStart(es->qd, eflags);
}
es->status = F_EXEC_RUN;
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 3288396def3..ecb2e4ccaa1 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -70,8 +70,7 @@ static int _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes,
Datum *Values, const char *Nulls);
-static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
- CachedPlanSource *plansource, int query_index);
+static int _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount);
static void _SPI_error_callback(void *arg);
@@ -1686,8 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- cplan,
- plansource);
+ cplan);
/*
* Set up options for portal. Default SCROLL type is chosen the same way
@@ -2502,7 +2500,6 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
- int query_index = 0;
spicallbackarg.query = plansource->query_string;
@@ -2693,16 +2690,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
- cplan,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
options->params,
_SPI_current->queryEnv,
0);
-
- res = _SPI_pquery(qdesc, fire_triggers, canSetTag ? options->tcount : 0,
- plansource, query_index);
+ res = _SPI_pquery(qdesc, fire_triggers,
+ canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
}
else
@@ -2799,8 +2794,6 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
my_res = res;
goto fail;
}
-
- query_index++;
}
/* Done with this plan, so release refcount */
@@ -2878,8 +2871,7 @@ _SPI_convert_params(int nargs, Oid *argtypes,
}
static int
-_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
- CachedPlanSource *plansource, int query_index)
+_SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount)
{
int operation = queryDesc->operation;
int eflags;
@@ -2935,16 +2927,7 @@ _SPI_pquery(QueryDesc *queryDesc, bool fire_triggers, uint64 tcount,
else
eflags = EXEC_FLAG_SKIP_TRIGGERS;
- if (queryDesc->cplan)
- {
- ExecutorStartCachedPlan(queryDesc, eflags, plansource, query_index);
- Assert(queryDesc->planstate);
- }
- else
- {
- if (!ExecutorStart(queryDesc, eflags))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
- }
+ ExecutorStart(queryDesc, eflags);
ExecutorRun(queryDesc, ForwardScanDirection, tcount);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 49ad6e83578..ff65867eebe 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -331,7 +331,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->finalrteperminfos = NIL;
glob->finalrowmarks = NIL;
glob->resultRelations = NIL;
- glob->firstResultRels = NIL;
glob->appendRelations = NIL;
glob->partPruneInfos = NIL;
glob->relationOids = NIL;
@@ -571,7 +570,6 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
- result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 150e9f060ee..999a5a8ab5a 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1248,9 +1248,6 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
lappend_int(root->glob->resultRelations,
splan->rootRelation);
}
- root->glob->firstResultRels =
- lappend_int(root->glob->firstResultRels,
- linitial_int(splan->resultRelations));
}
break;
case T_Append:
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 1ae51b1b391..92ddeba78fd 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1226,7 +1226,6 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
- NULL,
NULL);
/*
@@ -2028,8 +2027,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- cplan,
- psrc);
+ cplan);
/* Portal is defined, set the plan ID based on its contents. */
foreach(lc, portal->stmts)
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 8164d0fbb4f..d1593f38b35 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -19,7 +19,6 @@
#include "access/xact.h"
#include "commands/prepare.h"
-#include "executor/execdesc.h"
#include "executor/executor.h"
#include "executor/tstoreReceiver.h"
#include "miscadmin.h"
@@ -38,9 +37,6 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
- CachedPlan *cplan,
- CachedPlanSource *plansource,
- int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -70,7 +66,6 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
- CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -83,7 +78,6 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
- qd->cplan = cplan; /* CachedPlan supplying the plannedstmt */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -129,9 +123,6 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
- * cplan: CachedPlan supplying the plan
- * plansource: CachedPlanSource supplying the cplan
- * query_index: index of the query in plansource->query_list
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -144,9 +135,6 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
- CachedPlan *cplan,
- CachedPlanSource *plansource,
- int query_index,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -158,23 +146,14 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, cplan, sourceText,
+ queryDesc = CreateQueryDesc(plan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
/*
- * Prepare the plan for execution
+ * Call ExecutorStart to prepare the plan for execution
*/
- if (queryDesc->cplan)
- {
- ExecutorStartCachedPlan(queryDesc, 0, plansource, query_index);
- Assert(queryDesc->planstate);
- }
- else
- {
- if (!ExecutorStart(queryDesc, 0))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
- }
+ ExecutorStart(queryDesc, 0);
/*
* Run the plan to completion.
@@ -515,7 +494,6 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
- portal->cplan,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -535,19 +513,9 @@ PortalStart(Portal portal, ParamListInfo params,
myeflags = eflags;
/*
- * Prepare the plan for execution.
+ * Call ExecutorStart to prepare the plan for execution
*/
- if (portal->cplan)
- {
- ExecutorStartCachedPlan(queryDesc, myeflags,
- portal->plansource, 0);
- Assert(queryDesc->planstate);
- }
- else
- {
- if (!ExecutorStart(queryDesc, myeflags))
- elog(ERROR, "ExecutorStart() failed unexpectedly");
- }
+ ExecutorStart(queryDesc, myeflags);
/*
* This tells PortalCleanup to shut down the executor
@@ -1221,7 +1189,6 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
- int query_index = 0;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1303,9 +1270,6 @@ PortalRunMulti(Portal portal,
{
/* statement can set tag string */
ProcessQuery(pstmt,
- portal->cplan,
- portal->plansource,
- query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1315,9 +1279,6 @@ PortalRunMulti(Portal portal,
{
/* stmt added by rewrite cannot set tag */
ProcessQuery(pstmt,
- portal->cplan,
- portal->plansource,
- query_index,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1382,8 +1343,6 @@ PortalRunMulti(Portal portal,
*/
if (lnext(portal->stmts, stmtlist_item) != NULL)
CommandCounterIncrement();
-
- query_index++;
}
/* Pop the snapshot if we pushed one. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 9bcbc4c3e97..89a1c79e984 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -92,8 +92,7 @@ static void ReleaseGenericPlan(CachedPlanSource *plansource);
static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv,
- bool release_generic);
+ QueryEnvironment *queryEnv);
static bool CheckCachedPlan(CachedPlanSource *plansource);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
@@ -663,17 +662,10 @@ BuildingPlanRequiresSnapshot(CachedPlanSource *plansource)
* The result value is the transient analyzed-and-rewritten query tree if we
* had to do re-analysis, and NIL otherwise. (This is returned just to save
* a tree copying step in a subsequent BuildCachedPlan call.)
- *
- * This also releases and drops the generic plan (plansource->gplan), if any,
- * as most callers will typically build a new CachedPlan for the plansource
- * right after this. However, when called from UpdateCachedPlan(), the
- * function does not release the generic plan, as UpdateCachedPlan() updates
- * an existing CachedPlan in place.
*/
static List *
RevalidateCachedQuery(CachedPlanSource *plansource,
- QueryEnvironment *queryEnv,
- bool release_generic)
+ QueryEnvironment *queryEnv)
{
bool snapshot_set;
List *tlist; /* transient query-tree list */
@@ -772,9 +764,8 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
MemoryContextDelete(qcxt);
}
- /* Drop the generic plan reference, if any, and if requested */
- if (release_generic)
- ReleaseGenericPlan(plansource);
+ /* Drop the generic plan reference if any */
+ ReleaseGenericPlan(plansource);
/*
* Now re-do parse analysis and rewrite. This not incidentally acquires
@@ -937,10 +928,8 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired locks on the "unprunableRelids" set
- * for all plans in plansource->stmt_list. However, the plans are not fully
- * race-condition-free until the executor acquires locks on the prunable
- * relations that survive initial runtime pruning during InitPlan().
+ * On a "true" return, we have acquired the locks needed to run the plan.
+ * (We must do this for the "true" result to be race-condition-free.)
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -1025,8 +1014,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
* Planning work is done in the caller's memory context. The finished plan
* is in a child memory context, which typically should get reparented
* (unless this is a one-shot plan, in which case we don't copy the plan).
- *
- * Note: When changing this, you should also look at UpdateCachedPlan().
*/
static CachedPlan *
BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
@@ -1037,7 +1024,6 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
bool snapshot_set;
bool is_transient;
MemoryContext plan_context;
- MemoryContext stmt_context = NULL;
MemoryContext oldcxt = CurrentMemoryContext;
ListCell *lc;
@@ -1055,7 +1041,7 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
* let's treat it as real and redo the RevalidateCachedQuery call.
*/
if (!plansource->is_valid)
- qlist = RevalidateCachedQuery(plansource, queryEnv, true);
+ qlist = RevalidateCachedQuery(plansource, queryEnv);
/*
* If we don't already have a copy of the querytree list that can be
@@ -1093,19 +1079,10 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
PopActiveSnapshot();
/*
- * Normally, we create a dedicated memory context for the CachedPlan and
- * its subsidiary data. Although it's usually not very large, the context
- * is designed to allow growth if necessary.
- *
- * The PlannedStmts are stored in a separate child context (stmt_context)
- * of the CachedPlan's memory context. This separation allows
- * UpdateCachedPlan() to free and replace the PlannedStmts without
- * affecting the CachedPlan structure or its stmt_list List.
- *
- * For one-shot plans, we instead use the caller's memory context, as the
- * CachedPlan will not persist. stmt_context will be set to NULL in this
- * case, because UpdateCachedPlan() should never get called on a one-shot
- * plan.
+ * Normally we make a dedicated memory context for the CachedPlan and its
+ * subsidiary data. (It's probably not going to be large, but just in
+ * case, allow it to grow large. It's transient for the moment.) But for
+ * a one-shot plan, we just leave it in the caller's memory context.
*/
if (!plansource->is_oneshot)
{
@@ -1114,17 +1091,12 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ALLOCSET_START_SMALL_SIZES);
MemoryContextCopyAndSetIdentifier(plan_context, plansource->query_string);
- stmt_context = AllocSetContextCreate(CurrentMemoryContext,
- "CachedPlan PlannedStmts",
- ALLOCSET_START_SMALL_SIZES);
- MemoryContextCopyAndSetIdentifier(stmt_context, plansource->query_string);
- MemoryContextSetParent(stmt_context, plan_context);
+ /*
+ * Copy plan into the new context.
+ */
+ MemoryContextSwitchTo(plan_context);
- MemoryContextSwitchTo(stmt_context);
plist = copyObject(plist);
-
- MemoryContextSwitchTo(plan_context);
- plist = list_copy(plist);
}
else
plan_context = CurrentMemoryContext;
@@ -1165,10 +1137,8 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
plan->saved_xmin = InvalidTransactionId;
plan->refcount = 0;
plan->context = plan_context;
- plan->stmt_context = stmt_context;
plan->is_oneshot = plansource->is_oneshot;
plan->is_saved = false;
- plan->is_reused = false;
plan->is_valid = true;
/* assign generation number to new plan */
@@ -1179,113 +1149,6 @@ BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
return plan;
}
-/*
- * UpdateCachedPlan
- * Create fresh plans for all queries in the CachedPlanSource, replacing
- * those in the generic plan's stmt_list, and return the plan for the
- * query_index'th query.
- *
- * This function is primarily used by ExecutorStartCachedPlan() to handle
- * cases where the original generic CachedPlan becomes invalid. Such
- * invalidation may occur when prunable relations in the old plan for the
- * query_index'th query are locked in preparation for execution.
- *
- * Note that invalidations received during the execution of the query_index'th
- * query can affect both the queries that have already finished execution
- * (e.g., due to concurrent modifications on prunable relations that were not
- * locked during their execution) and also the queries that have not yet been
- * executed. As a result, this function updates all plans to ensure
- * CachedPlan.is_valid is safely set to true.
- *
- * The old PlannedStmts in plansource->gplan->stmt_list are freed here, so
- * the caller and any of its callers must not rely on them remaining accessible
- * after this function is called.
- */
-PlannedStmt *
-UpdateCachedPlan(CachedPlanSource *plansource, int query_index,
- QueryEnvironment *queryEnv)
-{
- List *query_list = plansource->query_list,
- *plan_list;
- ListCell *l1,
- *l2;
- CachedPlan *plan = plansource->gplan;
- MemoryContext oldcxt;
-
- Assert(ActiveSnapshotSet());
-
- /* Sanity checks (XXX can be Asserts?) */
- if (plan == NULL)
- elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan is NULL");
- else if (plan->is_valid)
- elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_valid is true");
- else if (plan->is_oneshot)
- elog(ERROR, "UpdateCachedPlan() called in the wrong context: plansource->gplan->is_oneshot is true");
-
- /*
- * The plansource might have become invalid since GetCachedPlan() returned
- * the CachedPlan. See the comment in BuildCachedPlan() for details on why
- * this might happen. Although invalidation is likely a false positive as
- * stated there, we make the plan valid to ensure the query list used for
- * planning is up to date.
- *
- * The risk of catching an invalidation is higher here than when
- * BuildCachedPlan() is called from GetCachedPlan(), because this function
- * is normally called long after GetCachedPlan() returns the CachedPlan,
- * so much more processing could have occurred including things that mark
- * the CachedPlanSource invalid.
- *
- * Note: Do not release plansource->gplan, because the upstream callers
- * (such as the callers of ExecutorStartCachedPlan()) would still be
- * referencing it.
- */
- if (!plansource->is_valid)
- query_list = RevalidateCachedQuery(plansource, queryEnv, false);
- Assert(query_list != NIL);
-
- /*
- * Build a new generic plan for all the queries after making a copy to be
- * scribbled on by the planner.
- */
- query_list = copyObject(query_list);
-
- /*
- * Planning work is done in the caller's memory context. The resulting
- * PlannedStmt is then copied into plan->stmt_context after throwing away
- * the old ones.
- */
- plan_list = pg_plan_queries(query_list, plansource->query_string,
- plansource->cursor_options, NULL);
- Assert(list_length(plan_list) == list_length(plan->stmt_list));
-
- MemoryContextReset(plan->stmt_context);
- oldcxt = MemoryContextSwitchTo(plan->stmt_context);
- forboth(l1, plan_list, l2, plan->stmt_list)
- {
- PlannedStmt *plannedstmt = lfirst(l1);
-
- lfirst(l2) = copyObject(plannedstmt);
- }
- MemoryContextSwitchTo(oldcxt);
-
- /*
- * XXX Should this also (re)set the properties of the CachedPlan that are
- * set in BuildCachedPlan() after creating the fresh plans such as
- * planRoleId, dependsOnRole, and saved_xmin?
- */
-
- /*
- * We've updated all the plans that might have been invalidated, so mark
- * the CachedPlan as valid.
- */
- plan->is_valid = true;
-
- /* Also update generic_cost because we just created a new generic plan. */
- plansource->generic_cost = cached_plan_cost(plan, false);
-
- return list_nth_node(PlannedStmt, plan->stmt_list, query_index);
-}
-
/*
* choose_custom_plan: choose whether to use custom or generic plan
*
@@ -1402,13 +1265,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid, but if it is a reused generic plan, not all
- * locks are acquired. In such cases, CheckCachedPlan() does not take locks
- * on relations subject to initial runtime pruning; instead, these locks are
- * deferred until execution startup, when ExecDoInitialPruning() performs
- * initial pruning. The plan's "is_reused" flag is set to indicate that
- * CachedPlanRequiresLocking() should return true when called by
- * ExecDoInitialPruning().
+ * On return, the plan is valid and we have sufficient locks to begin
+ * execution.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1434,7 +1292,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
elog(ERROR, "cannot apply ResourceOwner to non-saved cached plan");
/* Make sure the querytree list is valid and we have parse-time locks */
- qlist = RevalidateCachedQuery(plansource, queryEnv, true);
+ qlist = RevalidateCachedQuery(plansource, queryEnv);
/* Decide whether to use a custom plan */
customplan = choose_custom_plan(plansource, boundParams);
@@ -1446,8 +1304,6 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
Assert(plan->magic == CACHEDPLAN_MAGIC);
- /* Reusing the existing plan, so not all locks may be acquired. */
- plan->is_reused = true;
}
else
{
@@ -1913,7 +1769,7 @@ CachedPlanGetTargetList(CachedPlanSource *plansource,
return NIL;
/* Make sure the querytree list is valid and we have parse-time locks */
- RevalidateCachedQuery(plansource, queryEnv, true);
+ RevalidateCachedQuery(plansource, queryEnv);
/* Get the primary statement and find out what it returns */
pstmt = QueryListGetPrimaryStmt(plansource->query_list);
@@ -2035,7 +1891,7 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
foreach(lc1, stmt_list)
{
PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
- int rtindex;
+ ListCell *lc2;
if (plannedstmt->commandType == CMD_UTILITY)
{
@@ -2053,16 +1909,13 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
continue;
}
- rtindex = -1;
- while ((rtindex = bms_next_member(plannedstmt->unprunableRelids,
- rtindex)) >= 0)
+ foreach(lc2, plannedstmt->rtable)
{
- RangeTblEntry *rte = list_nth_node(RangeTblEntry,
- plannedstmt->rtable,
- rtindex - 1);
+ RangeTblEntry *rte = (RangeTblEntry *) lfirst(lc2);
- Assert(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+ if (!(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
+ continue;
/*
* Acquire the appropriate type of lock on each relation OID. Note
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index e3526e78064..0be1c2b0fff 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,8 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan,
- CachedPlanSource *plansource)
+ CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
Assert(portal->status == PORTAL_NEW);
@@ -300,7 +299,6 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
portal->stmts = stmts;
portal->cplan = cplan;
- portal->plansource = plansource;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 03c5b3d73e5..3b122f79ed8 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,10 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
struct ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, CachedPlan *cplan,
- CachedPlanSource *plansource, int query_index,
- IntoClause *into, struct ExplainState *es,
- const char *queryString,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+ struct ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
diff --git a/src/include/commands/trigger.h b/src/include/commands/trigger.h
index 4180601dcd4..2ed2c4bb378 100644
--- a/src/include/commands/trigger.h
+++ b/src/include/commands/trigger.h
@@ -258,7 +258,6 @@ extern void ExecASTruncateTriggers(EState *estate,
extern void AfterTriggerBeginXact(void);
extern void AfterTriggerBeginQuery(void);
extern void AfterTriggerEndQuery(EState *estate);
-extern void AfterTriggerAbortQuery(void);
extern void AfterTriggerFireDeferred(void);
extern void AfterTriggerEndXact(bool isCommit);
extern void AfterTriggerBeginSubXact(void);
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index ba53305ad42..86db3dc8d0d 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -35,7 +35,6 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
- CachedPlan *cplan; /* CachedPlan that supplies the plannedstmt */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -58,7 +57,6 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
- CachedPlan *cplan,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index ae99407db89..fbe4bf081f7 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -73,7 +73,7 @@
/* Hook for plugins to get control in ExecutorStart() */
-typedef bool (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
+typedef void (*ExecutorStart_hook_type) (QueryDesc *queryDesc, int eflags);
extern PGDLLIMPORT ExecutorStart_hook_type ExecutorStart_hook;
/* Hook for plugins to get control in ExecutorRun() */
@@ -229,11 +229,8 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
/*
* prototypes from functions in execMain.c
*/
-extern bool ExecutorStart(QueryDesc *queryDesc, int eflags);
-extern void ExecutorStartCachedPlan(QueryDesc *queryDesc, int eflags,
- CachedPlanSource *plansource,
- int query_index);
-extern bool standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
@@ -300,30 +297,6 @@ extern void ExecEndNode(PlanState *node);
extern void ExecShutdownNode(PlanState *node);
extern void ExecSetTupleBound(int64 tuples_needed, PlanState *child_node);
-/*
- * Is the CachedPlan in es_cachedplan still valid?
- *
- * Called from InitPlan() because invalidation messages that affect the plan
- * might be received after locks have been taken on runtime-prunable relations.
- * The caller should take appropriate action if the plan has become invalid.
- */
-static inline bool
-ExecPlanStillValid(EState *estate)
-{
- return estate->es_cachedplan == NULL ? true :
- CachedPlanValid(estate->es_cachedplan);
-}
-
-/*
- * Locks are needed only if running a cached plan that might contain unlocked
- * relations, such as a reused generic plan.
- */
-static inline bool
-ExecShouldLockRelations(EState *estate)
-{
- return estate->es_cachedplan == NULL ? false :
- CachedPlanRequiresLocking(estate->es_cachedplan);
-}
/* ----------------------------------------------------------------
* ExecProcNode
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 5b6cadb5a6c..2492282213f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -42,7 +42,6 @@
#include "storage/condition_variable.h"
#include "utils/hsearch.h"
#include "utils/queryenvironment.h"
-#include "utils/plancache.h"
#include "utils/reltrigger.h"
#include "utils/sharedtuplestore.h"
#include "utils/snapshot.h"
@@ -664,7 +663,6 @@ typedef struct EState
* ExecRowMarks, or NULL if none */
List *es_rteperminfos; /* List of RTEPermissionInfo */
PlannedStmt *es_plannedstmt; /* link to top of plan tree */
- CachedPlan *es_cachedplan; /* CachedPlan providing the plan tree */
List *es_part_prune_infos; /* List of PartitionPruneInfo */
List *es_part_prune_states; /* List of PartitionPruneState */
List *es_part_prune_results; /* List of Bitmapset */
@@ -717,7 +715,6 @@ typedef struct EState
int es_top_eflags; /* eflags passed to ExecutorStart */
int es_instrument; /* OR of InstrumentOption flags */
bool es_finished; /* true when ExecutorFinish is done */
- bool es_aborted; /* true when execution was aborted */
List *es_exprcontexts; /* List of ExprContexts within EState */
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 1dd2d1560cb..6567759595d 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -138,9 +138,6 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
- /* "flat" list of integer RT indexes (one per ModifyTable node) */
- List *firstResultRels;
-
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 658d76225e4..f0d514e6e15 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -105,13 +105,6 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
- /*
- * rtable indexes of first target relation in each ModifyTable node in the
- * plan for INSERT/UPDATE/DELETE/MERGE
- */
- /* integer list of RT indexes, or NIL */
- List *firstResultRels;
-
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 07ec5318db7..1baa6d50bfd 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -18,8 +18,6 @@
#include "access/tupdesc.h"
#include "lib/ilist.h"
#include "nodes/params.h"
-#include "nodes/parsenodes.h"
-#include "nodes/plannodes.h"
#include "tcop/cmdtag.h"
#include "utils/queryenvironment.h"
#include "utils/resowner.h"
@@ -153,11 +151,10 @@ typedef struct CachedPlanSource
* The reference count includes both the link from the parent CachedPlanSource
* (if any), and any active plan executions, so the plan can be discarded
* exactly when refcount goes to zero. Both the struct itself and the
- * subsidiary data, except the PlannedStmts in stmt_list live in the context
- * denoted by the context field; the PlannedStmts live in the context denoted
- * by stmt_context. Separate contexts makes it easy to free a no-longer-needed
- * cached plan. (However, if is_oneshot is true, the context does not belong
- * solely to the CachedPlan so no freeing is possible.)
+ * subsidiary data live in the context denoted by the context field.
+ * This makes it easy to free a no-longer-needed cached plan. (However,
+ * if is_oneshot is true, the context does not belong solely to the CachedPlan
+ * so no freeing is possible.)
*/
typedef struct CachedPlan
{
@@ -165,7 +162,6 @@ typedef struct CachedPlan
List *stmt_list; /* list of PlannedStmts */
bool is_oneshot; /* is it a "oneshot" plan? */
bool is_saved; /* is CachedPlan in a long-lived context? */
- bool is_reused; /* is it a reused generic plan? */
bool is_valid; /* is the stmt_list currently valid? */
Oid planRoleId; /* Role ID the plan was created for */
bool dependsOnRole; /* is plan specific to that role? */
@@ -174,10 +170,6 @@ typedef struct CachedPlan
int generation; /* parent's generation number for this plan */
int refcount; /* count of live references to this struct */
MemoryContext context; /* context containing this CachedPlan */
- MemoryContext stmt_context; /* context containing the PlannedStmts in
- * stmt_list, but not the List itself which is
- * in the above context; NULL if is_oneshot is
- * true. */
} CachedPlan;
/*
@@ -249,10 +241,6 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
-extern PlannedStmt *UpdateCachedPlan(CachedPlanSource *plansource,
- int query_index,
- QueryEnvironment *queryEnv);
-
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
@@ -265,30 +253,4 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
-/*
- * CachedPlanRequiresLocking: should the executor acquire additional locks?
- *
- * If the plan is a saved generic plan, the executor must acquire locks for
- * relations that are not covered by AcquireExecutorLocks(), such as partitions
- * that are subject to initial runtime pruning.
- */
-static inline bool
-CachedPlanRequiresLocking(CachedPlan *cplan)
-{
- return !cplan->is_oneshot && cplan->is_reused;
-}
-
-/*
- * CachedPlanValid
- * Returns whether a cached generic plan is still valid.
- *
- * Invoked by the executor to check if the plan has not been invalidated after
- * taking locks during the initialization of the plan.
- */
-static inline bool
-CachedPlanValid(CachedPlan *cplan)
-{
- return cplan->is_valid;
-}
-
#endif /* PLANCACHE_H */
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index ddee031f551..0b62143af8b 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -138,7 +138,6 @@ typedef struct PortalData
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
- CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
ParamListInfo portalParams; /* params to pass to query */
QueryEnvironment *queryEnv; /* environment for query */
@@ -241,8 +240,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
- CachedPlan *cplan,
- CachedPlanSource *plansource);
+ CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
extern void PortalHashTableDeleteAll(void);
--
2.43.0
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-05-21 10:22 Amit Langote <[email protected]>
parent: Tomas Vondra <[email protected]>
0 siblings, 0 replies; 82+ messages in thread
From: Amit Langote @ 2025-05-21 10:22 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Wed, May 21, 2025 at 3:44 AM Tomas Vondra <[email protected]> wrote:
> On 5/20/25 05:06, Tom Lane wrote:
> > Amit Langote <[email protected]> writes:
> >> Pushed after some tweaks to comments and the test case.
> >
> > My attention was drawn to commit 525392d57 after observing that
> > Valgrind complained about a memory leak in some code that commit added
> > to BuildCachedPlan(). I tried to make sense of said code so I could
> > remove the leak, and eventually arrived at the attached patch, which
> > is part of a series of leak-fixing things hence the high sequence
> > number.
> >
> > Unfortunately, the bad things I speculated about in the added comments
> > seem to be reality. The second attached file is a test case that
> > triggers
> >
> > ...
>
> FYI I added this as a PG18 open item:
>
> https://wiki.postgresql.org/wiki/PostgreSQL_18_Open_Items
Thanks Tomas.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-05-22 08:12 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 2 replies; 82+ messages in thread
From: Amit Langote @ 2025-05-22 08:12 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Wed, May 21, 2025 at 7:22 PM Amit Langote <[email protected]> wrote:
> Fair enough. I’ll revert this and some related changes shortly. WIP
> patch attached.
I have pushed out the revert now.
Note that I’ve only reverted the changes related to deferring locks on
prunable partitions. I’m planning to leave the preparatory commits
leading up to that one in place unless anyone objects. For reference,
here they are in chronological order (the last 3 are bug fixes):
bb3ec16e14d Move PartitionPruneInfo out of plan nodes into PlannedStmt
d47cbf474ec Perform runtime initial pruning outside ExecInitNode()
cbc127917e0 Track unpruned relids to avoid processing pruned relations
75dfde13639 Fix an oversight in cbc127917 to handle MERGE correctly
cbb9086c9ef Fix bug in cbc127917 to handle nested Append correctly
28317de723b Ensure first ModifyTable rel initialized if all are pruned
I think separating initial pruning from plan node initialization is
still worthwhile on its own, as evidenced by the improvements in
cbc127917e.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-05-22 13:04 Tomas Vondra <[email protected]>
parent: Amit Langote <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Tomas Vondra @ 2025-05-22 13:04 UTC (permalink / raw)
To: Amit Langote <[email protected]>; Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On 5/22/25 10:12, Amit Langote wrote:
> On Wed, May 21, 2025 at 7:22 PM Amit Langote <[email protected]> wrote:
>> Fair enough. I’ll revert this and some related changes shortly. WIP
>> patch attached.
>
> I have pushed out the revert now.
>
Thank you.
> Note that I’ve only reverted the changes related to deferring locks on
> prunable partitions. I’m planning to leave the preparatory commits
> leading up to that one in place unless anyone objects. For reference,
> here they are in chronological order (the last 3 are bug fixes):
>
> bb3ec16e14d Move PartitionPruneInfo out of plan nodes into PlannedStmt
> d47cbf474ec Perform runtime initial pruning outside ExecInitNode()
> cbc127917e0 Track unpruned relids to avoid processing pruned relations
> 75dfde13639 Fix an oversight in cbc127917 to handle MERGE correctly
> cbb9086c9ef Fix bug in cbc127917 to handle nested Append correctly
> 28317de723b Ensure first ModifyTable rel initialized if all are pruned
>
> I think separating initial pruning from plan node initialization is
> still worthwhile on its own, as evidenced by the improvements in
> cbc127917e.
>
I'm OK with that in principle, assuming the benefits outweigh the risk
of making backpatching harder. The patches don't seem exceptionally
large / invasive, but I don't know how often we modify these parts.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-05-22 13:50 Robert Haas <[email protected]>
parent: Tom Lane <[email protected]>
1 sibling, 0 replies; 82+ messages in thread
From: Robert Haas @ 2025-05-22 13:50 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Amit Langote <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Tue, May 20, 2025 at 11:38 AM Tom Lane <[email protected]> wrote:
> I still like the core idea of deferring locking, but I don't like
> anything about this implementation of it. It seems like there has
> to be a better and simpler way.
Without particularly defending this implementation, and certainly
without defending its bugs, I just want to say that I'm not convinced
by the idea that there has to be a better and simpler way. We --
principally Amit, but also me and you and others -- have been trying
to find the best way of doing this for probably 5 years now. If you do
something during executor startup, you have to be prepared for
executor startup to force a replan, and if you do something before
executor startup, then you're duplicating executor logic into a new
phase that needs to communicate its results forward to execution
proper. Either approach is awkward and that awkwardness seems to
inevitably bleed into the plan cache specifically. I'd be beyond
delighted if you want to help chart a path through the awkwardness
here, since you know this stuff better than anybody, but I am
skeptical that there is a truly marvelous approach which we've just
managed to overlook for all this time.
--
Robert Haas
EDB: http://www.enterprisedb.com
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-05-23 02:17 Amit Langote <[email protected]>
parent: Tomas Vondra <[email protected]>
0 siblings, 0 replies; 82+ messages in thread
From: Amit Langote @ 2025-05-23 02:17 UTC (permalink / raw)
To: Tomas Vondra <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Thu, May 22, 2025 at 10:04 PM Tomas Vondra <[email protected]> wrote:
> On 5/22/25 10:12, Amit Langote wrote:
> > Note that I’ve only reverted the changes related to deferring locks on
> > prunable partitions. I’m planning to leave the preparatory commits
> > leading up to that one in place unless anyone objects. For reference,
> > here they are in chronological order (the last 3 are bug fixes):
> >
> > bb3ec16e14d Move PartitionPruneInfo out of plan nodes into PlannedStmt
> > d47cbf474ec Perform runtime initial pruning outside ExecInitNode()
> > cbc127917e0 Track unpruned relids to avoid processing pruned relations
> > 75dfde13639 Fix an oversight in cbc127917 to handle MERGE correctly
> > cbb9086c9ef Fix bug in cbc127917 to handle nested Append correctly
> > 28317de723b Ensure first ModifyTable rel initialized if all are pruned
> >
> > I think separating initial pruning from plan node initialization is
> > still worthwhile on its own, as evidenced by the improvements in
> > cbc127917e.
> >
>
> I'm OK with that in principle, assuming the benefits outweigh the risk
> of making backpatching harder. The patches don't seem exceptionally
> large / invasive, but I don't know how often we modify these parts.
Thanks. I agree it's something to be mindful of, but I don’t expect
the reimplementation of the locking deferral to require changes to
this part of the code again. So barring any surprises, it shouldn't be
the case that the pruning code ends up looking significantly different
in v19.
Also, the actual pruning logic hasn’t changed much -- just where it’s
called from.
Let me know if any of that still raises concerns.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-06-20 12:30 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Amit Langote @ 2025-06-20 12:30 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Thu, May 22, 2025 at 5:12 PM Amit Langote <[email protected]> wrote:
> I have pushed out the revert now.
>
> Note that I’ve only reverted the changes related to deferring locks on
> prunable partitions. I’m planning to leave the preparatory commits
> leading up to that one in place unless anyone objects. For reference,
> here they are in chronological order (the last 3 are bug fixes):
>
> bb3ec16e14d Move PartitionPruneInfo out of plan nodes into PlannedStmt
> d47cbf474ec Perform runtime initial pruning outside ExecInitNode()
> cbc127917e0 Track unpruned relids to avoid processing pruned relations
> 75dfde13639 Fix an oversight in cbc127917 to handle MERGE correctly
> cbb9086c9ef Fix bug in cbc127917 to handle nested Append correctly
> 28317de723b Ensure first ModifyTable rel initialized if all are pruned
>
> I think separating initial pruning from plan node initialization is
> still worthwhile on its own, as evidenced by the improvements in
> cbc127917e.
I've been thinking about how to address the concerns Tom raised about
the reverted patch. Here's a summary of where my thinking currently
stands.
* CachedPlan invalidation handling:
The first issue is the part of the old design where a CachedPlan
invalidated during executor startup -- while locking unpruned
partitions -- was modified in place to replace the stale PlannedStmts
in its stmt_list with new ones obtained by replanning all queries in
the enclosing CachedPlanSource's query_list. I did that mainly to
ensure that replanning happens as soon as the executor discovers the
plan is invalid, instead of returning to the caller and requiring them
to go back to plancache.c to trigger replanning. There were many
issues with making that approach work in practice, because different
callers of the executor have different ways of running plans from a
CachedPlan -- with pquery.c in particular being hard to refactor
cleanly to support that flow.
The first alternative I came up with is to place only the query whose
PlannedStmt is being initialized into a standalone CachedPlanSource
and create a corresponding standalone CachedPlan. "Standalone" here
means that both objects are "saved" independently of the original
CachedPlanSource and CachedPlan, but are still tracked by the
invalidation callbacks.
But thinking about it more recently, what's actually important is not
whether we construct a new CachedPlan at all, but simply that we
replan just the one query that needs to be run, and use the resulting
PlannedStmt directly. The planner will have taken all required locks,
so we don't need to register the plan with the invalidation machinery
-- concurrent invalidations can't affect correctness.
In that case, the replanned PlannedStmt can be treated as transient
executor-local state, with no need to carry any of the plan cache
infrastructure along with it. To support that, I further assume that,
because replanning and execution happen essentially back-to-back,
there's no opportunity for role-based or xmin-based invalidation (as
is checked for a CachedPlan in CheckCachedPlan()) to affect the plan
in between. If that reasoning holds, then we don't need to register
the replanned statement with the invalidation machinery at all.
Because we wouldn't have touched the original CachedPlan at all, the
stale PlannedStmts in it wouldn't be replaced until the next
GetCachedPlan() call triggers replanning. I'm willing to accept that
as a tradeoff for a less invasive design to handle replanning in the
executor.
Finally, it's worth noting that the executor is always passed the
entire CachedPlan, regardless of which individual statement is being
executed. Without per-statement validity tracking, it's hard for the
executor to tell whether replanning is actually needed for a given
query when the CachedPlan is marked invalid (is_valid=false), making
it impossible to selectively replan just one. To support that, what I
would need is validity tracking at the level of individual
PlannedStmts -- and perhaps even Querys -- in the source's query_list,
with the current is_valid flag effectively serving as the logical AND
of all the individual flags. We didn't need that in the old design,
because we'd replace all statements to mark the CachedPlan valid again
-- though Tom was right to point out flaws in the assumption that
setting is_valid like that was actually safe.
* ExecutorStart() interface damage control:
The other aspect I’ve been thinking about is how to contain the
changes required inside ExecutorStart(), and limit the disruption to
ExecutorStart_hooks in particular, while keeping changes for outside
callers narrowly scoped. In the previous patch, pruning, locking, and
invalidation checking were all done inside InitPlan(), which is called
by standard_ExecutorStart() -- an implementation choice that was
potentially disruptive to extensions using ExecutorStart_hook. Since
such hooks are expected to call standard_ExecutorStart() to perform
core plan initialization, they would have to check afterward whether
the plan had actually been initialized successfully, in case an
invalidation occurred during InitPlan(). That wasn’t optional, and it
made it easy for hook authors to miss the fact that
standard_ExecutorStart() could return without initializing the plan,
breaking expectations that were previously reliable.
Separately, for top-level callers of the executor, the patch
introduced a new entry point, ExecutorStartCachedPlan(), to avoid
requiring each caller to implement its own replanning loop. But that
approach was also awkward, since it required switching to a
nonstandard function just to get correct behavior.
What I’m thinking now is that we should instead move the logic for
pruning, deferred locking, and replanning directly into
ExecutorStart() itself. In the reverted patch, callers were affected
mainly because they had to choose between ExecutorStart() and a new
entry point, ExecutorStartCachedPlan(), which existed solely to handle
invalidation and replanning. That divergence from the standard API
made things awkward at the call site.
In contrast, the design I’m proposing avoids any need for new executor
entry points -- ExecutorStart() retains its original signature and
behavior, with the added benefit that replanning and pruning are now
handled internally before hooks or standard initialization logic are
invoked. The design requires moving some code from
standard_ExecutorStart() -- specifically the code that sets up the
EState and parameters -- and from InitPlan() -- namely, the parts that
initialize the range table, partition pruning state, and perform
ExecDoInitialPruning().
The callers of ExecutorStart() do still need to ensure that they pass
the CachedPlan, the CachedPlanSource, and the query_index in QueryDesc
via CreateQueryDesc(). The executor’s external API remains unchanged.
Importantly, this restructuring would not require any behavioral
changes for existing ExecutorStart_hook implementations. From a hook’s
point of view, this is a code motion change only. Hooks are still
invoked at the same point, but they’re now guaranteed to receive a
plan that is valid and ready for execution. This avoids the control
flow surprises introduced by the reverted patch -- specifically, the
need for hooks to detect whether standard_ExecutorStart() had
completed successfully -- while preserving the executor’s API and
execution contract as they exist in master.
I’ll hold off on writing any code for now -- just wanted to lay out
this direction and hear what others think, especially Tom.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-07-17 12:11 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2025-07-17 12:11 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Fri, Jun 20, 2025 at 9:30 PM Amit Langote <[email protected]> wrote:
> On Thu, May 22, 2025 at 5:12 PM Amit Langote <[email protected]> wrote:
> > I have pushed out the revert now.
> >
> > Note that I’ve only reverted the changes related to deferring locks on
> > prunable partitions. I’m planning to leave the preparatory commits
> > leading up to that one in place unless anyone objects. For reference,
> > here they are in chronological order (the last 3 are bug fixes):
> >
> > bb3ec16e14d Move PartitionPruneInfo out of plan nodes into PlannedStmt
> > d47cbf474ec Perform runtime initial pruning outside ExecInitNode()
> > cbc127917e0 Track unpruned relids to avoid processing pruned relations
> > 75dfde13639 Fix an oversight in cbc127917 to handle MERGE correctly
> > cbb9086c9ef Fix bug in cbc127917 to handle nested Append correctly
> > 28317de723b Ensure first ModifyTable rel initialized if all are pruned
> >
> > I think separating initial pruning from plan node initialization is
> > still worthwhile on its own, as evidenced by the improvements in
> > cbc127917e.
>
> I've been thinking about how to address the concerns Tom raised about
> the reverted patch. Here's a summary of where my thinking currently
> stands.
>
> * CachedPlan invalidation handling:
>
> The first issue is the part of the old design where a CachedPlan
> invalidated during executor startup -- while locking unpruned
> partitions -- was modified in place to replace the stale PlannedStmts
> in its stmt_list with new ones obtained by replanning all queries in
> the enclosing CachedPlanSource's query_list. I did that mainly to
> ensure that replanning happens as soon as the executor discovers the
> plan is invalid, instead of returning to the caller and requiring them
> to go back to plancache.c to trigger replanning. There were many
> issues with making that approach work in practice, because different
> callers of the executor have different ways of running plans from a
> CachedPlan -- with pquery.c in particular being hard to refactor
> cleanly to support that flow.
>
> The first alternative I came up with is to place only the query whose
> PlannedStmt is being initialized into a standalone CachedPlanSource
> and create a corresponding standalone CachedPlan. "Standalone" here
> means that both objects are "saved" independently of the original
> CachedPlanSource and CachedPlan, but are still tracked by the
> invalidation callbacks.
>
> But thinking about it more recently, what's actually important is not
> whether we construct a new CachedPlan at all, but simply that we
> replan just the one query that needs to be run, and use the resulting
> PlannedStmt directly. The planner will have taken all required locks,
> so we don't need to register the plan with the invalidation machinery
> -- concurrent invalidations can't affect correctness.
>
> In that case, the replanned PlannedStmt can be treated as transient
> executor-local state, with no need to carry any of the plan cache
> infrastructure along with it. To support that, I further assume that,
> because replanning and execution happen essentially back-to-back,
> there's no opportunity for role-based or xmin-based invalidation (as
> is checked for a CachedPlan in CheckCachedPlan()) to affect the plan
> in between. If that reasoning holds, then we don't need to register
> the replanned statement with the invalidation machinery at all.
>
> Because we wouldn't have touched the original CachedPlan at all, the
> stale PlannedStmts in it wouldn't be replaced until the next
> GetCachedPlan() call triggers replanning. I'm willing to accept that
> as a tradeoff for a less invasive design to handle replanning in the
> executor.
>
> Finally, it's worth noting that the executor is always passed the
> entire CachedPlan, regardless of which individual statement is being
> executed. Without per-statement validity tracking, it's hard for the
> executor to tell whether replanning is actually needed for a given
> query when the CachedPlan is marked invalid (is_valid=false), making
> it impossible to selectively replan just one. To support that, what I
> would need is validity tracking at the level of individual
> PlannedStmts -- and perhaps even Querys -- in the source's query_list,
> with the current is_valid flag effectively serving as the logical AND
> of all the individual flags. We didn't need that in the old design,
> because we'd replace all statements to mark the CachedPlan valid again
> -- though Tom was right to point out flaws in the assumption that
> setting is_valid like that was actually safe.
>
> * ExecutorStart() interface damage control:
>
> The other aspect I’ve been thinking about is how to contain the
> changes required inside ExecutorStart(), and limit the disruption to
> ExecutorStart_hooks in particular, while keeping changes for outside
> callers narrowly scoped. In the previous patch, pruning, locking, and
> invalidation checking were all done inside InitPlan(), which is called
> by standard_ExecutorStart() -- an implementation choice that was
> potentially disruptive to extensions using ExecutorStart_hook. Since
> such hooks are expected to call standard_ExecutorStart() to perform
> core plan initialization, they would have to check afterward whether
> the plan had actually been initialized successfully, in case an
> invalidation occurred during InitPlan(). That wasn’t optional, and it
> made it easy for hook authors to miss the fact that
> standard_ExecutorStart() could return without initializing the plan,
> breaking expectations that were previously reliable.
>
> Separately, for top-level callers of the executor, the patch
> introduced a new entry point, ExecutorStartCachedPlan(), to avoid
> requiring each caller to implement its own replanning loop. But that
> approach was also awkward, since it required switching to a
> nonstandard function just to get correct behavior.
>
> What I’m thinking now is that we should instead move the logic for
> pruning, deferred locking, and replanning directly into
> ExecutorStart() itself. In the reverted patch, callers were affected
> mainly because they had to choose between ExecutorStart() and a new
> entry point, ExecutorStartCachedPlan(), which existed solely to handle
> invalidation and replanning. That divergence from the standard API
> made things awkward at the call site.
>
> In contrast, the design I’m proposing avoids any need for new executor
> entry points -- ExecutorStart() retains its original signature and
> behavior, with the added benefit that replanning and pruning are now
> handled internally before hooks or standard initialization logic are
> invoked. The design requires moving some code from
> standard_ExecutorStart() -- specifically the code that sets up the
> EState and parameters -- and from InitPlan() -- namely, the parts that
> initialize the range table, partition pruning state, and perform
> ExecDoInitialPruning().
>
> The callers of ExecutorStart() do still need to ensure that they pass
> the CachedPlan, the CachedPlanSource, and the query_index in QueryDesc
> via CreateQueryDesc(). The executor’s external API remains unchanged.
>
> Importantly, this restructuring would not require any behavioral
> changes for existing ExecutorStart_hook implementations. From a hook’s
> point of view, this is a code motion change only. Hooks are still
> invoked at the same point, but they’re now guaranteed to receive a
> plan that is valid and ready for execution. This avoids the control
> flow surprises introduced by the reverted patch -- specifically, the
> need for hooks to detect whether standard_ExecutorStart() had
> completed successfully -- while preserving the executor’s API and
> execution contract as they exist in master.
>
> I’ll hold off on writing any code for now -- just wanted to lay out
> this direction and hear what others think, especially Tom.
The refinements I described in my email above might help mitigate some
of those executor-related issues. However, I'm starting to wonder if
it's worth reconsidering our decision to handle pruning, locking, and
validation entirely at executor startup, which was the approach taken
in the reverted patch.
The alternative approach, doing initial pruning and locking within
plancache.c itself (which I floated a while ago), might be worth
revisiting. It avoids the complications we've discussed around the
executor API and preserves the clear separation of concerns that
plancache.c provides, though it does introduce some new layering
concerns, which I describe further below.
To support this, we'd need a mechanism to pass pruning results to the
executor alongside each PlannedStmt. For each PartitionPruneInfo in
the plan, that would include the corresponding PartitionPruneState and
the bitmapset of surviving relids determined by initial pruning. Given
that a CachedPlan can contain multiple PlannedStmts, this would
effectively be a list of pruning results, one per statement. One
reasonable way to handle that might be to define a parallel data
structure, separate from PlannedStmt, constructed by plancache.c and
carried via QueryDesc. The memory and lifetime management would mirror
how ParamListInfo is handled today, leaving the executor API unchanged
and avoiding intrusive changes to PlannedStmt.
However, one potentially problematic aspect of this design is managing
the lifecycle of the relations referenced by PartitionPruneState.
Currently, partitioned table relations are opened by the executor
after entering ExecutorStart() and closed automatically by
ExecEndPlan(), allowing cleanup of pruning states implicitly. If we
perform initial pruning earlier, we'd need to keep these relations
open longer, necessitating explicit cleanup calls (e.g., a new
FinishPartitionPruneState()) invoked by the caller of the executor,
such as from ExecutorEnd() or even higher-level callers. This
introduces some questionable layering by shifting responsibility for
relation management tasks, which ideally belong within the executor,
into its callers.
My sense is that the complexity involved in carrying pruning results
via this parallel data structure was one of the concerns Tom raised
previously, alongside the significant pruning code refactoring that
the earlier patch required. The latter, at least, should no longer be
necessary given recent code improvements.
I think that's about as many approaches as I can think of, and would
really appreciate others' thoughts on these alternatives.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-07-22 06:43 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2025-07-22 06:43 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Thu, Jul 17, 2025 at 9:11 PM Amit Langote <[email protected]> wrote:
> The refinements I described in my email above might help mitigate some
> of those executor-related issues. However, I'm starting to wonder if
> it's worth reconsidering our decision to handle pruning, locking, and
> validation entirely at executor startup, which was the approach taken
> in the reverted patch.
>
> The alternative approach, doing initial pruning and locking within
> plancache.c itself (which I floated a while ago), might be worth
> revisiting. It avoids the complications we've discussed around the
> executor API and preserves the clear separation of concerns that
> plancache.c provides, though it does introduce some new layering
> concerns, which I describe further below.
>
> To support this, we'd need a mechanism to pass pruning results to the
> executor alongside each PlannedStmt. For each PartitionPruneInfo in
> the plan, that would include the corresponding PartitionPruneState and
> the bitmapset of surviving relids determined by initial pruning. Given
> that a CachedPlan can contain multiple PlannedStmts, this would
> effectively be a list of pruning results, one per statement. One
> reasonable way to handle that might be to define a parallel data
> structure, separate from PlannedStmt, constructed by plancache.c and
> carried via QueryDesc. The memory and lifetime management would mirror
> how ParamListInfo is handled today, leaving the executor API unchanged
> and avoiding intrusive changes to PlannedStmt.
>
> However, one potentially problematic aspect of this design is managing
> the lifecycle of the relations referenced by PartitionPruneState.
> Currently, partitioned table relations are opened by the executor
> after entering ExecutorStart() and closed automatically by
> ExecEndPlan(), allowing cleanup of pruning states implicitly. If we
> perform initial pruning earlier, we'd need to keep these relations
> open longer, necessitating explicit cleanup calls (e.g., a new
> FinishPartitionPruneState()) invoked by the caller of the executor,
> such as from ExecutorEnd() or even higher-level callers. This
> introduces some questionable layering by shifting responsibility for
> relation management tasks, which ideally belong within the executor,
> into its callers.
>
> My sense is that the complexity involved in carrying pruning results
> via this parallel data structure was one of the concerns Tom raised
> previously, alongside the significant pruning code refactoring that
> the earlier patch required. The latter, at least, should no longer be
> necessary given recent code improvements.
One point I forgot to mention about this approach is that we'd also
need to ensure permissions on parent relations are checked before
performing initial pruning in plancache.c, since pruning may involve
evaluating user-provided expressions. So in effect, we'd need to
invoke not just ExecDoInitialPruning(), but also
ExecCheckPermissions(), or some variant of it, prior to executor
startup. While manageable, it does add slightly to the complexity.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-11-12 14:17 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2025-11-12 14:17 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Hi,
On Tue, Jul 22, 2025 at 3:43 PM Amit Langote <[email protected]> wrote:
> On Thu, Jul 17, 2025 at 9:11 PM Amit Langote <[email protected]> wrote:
> > The refinements I described in my email above might help mitigate some
> > of those executor-related issues. However, I'm starting to wonder if
> > it's worth reconsidering our decision to handle pruning, locking, and
> > validation entirely at executor startup, which was the approach taken
> > in the reverted patch.
> >
> > The alternative approach, doing initial pruning and locking within
> > plancache.c itself (which I floated a while ago), might be worth
> > revisiting. It avoids the complications we've discussed around the
> > executor API and preserves the clear separation of concerns that
> > plancache.c provides, though it does introduce some new layering
> > concerns, which I describe further below.
> >
> > To support this, we'd need a mechanism to pass pruning results to the
> > executor alongside each PlannedStmt. For each PartitionPruneInfo in
> > the plan, that would include the corresponding PartitionPruneState and
> > the bitmapset of surviving relids determined by initial pruning. Given
> > that a CachedPlan can contain multiple PlannedStmts, this would
> > effectively be a list of pruning results, one per statement. One
> > reasonable way to handle that might be to define a parallel data
> > structure, separate from PlannedStmt, constructed by plancache.c and
> > carried via QueryDesc. The memory and lifetime management would mirror
> > how ParamListInfo is handled today, leaving the executor API unchanged
> > and avoiding intrusive changes to PlannedStmt.
> >
> > However, one potentially problematic aspect of this design is managing
> > the lifecycle of the relations referenced by PartitionPruneState.
> > Currently, partitioned table relations are opened by the executor
> > after entering ExecutorStart() and closed automatically by
> > ExecEndPlan(), allowing cleanup of pruning states implicitly. If we
> > perform initial pruning earlier, we'd need to keep these relations
> > open longer, necessitating explicit cleanup calls (e.g., a new
> > FinishPartitionPruneState()) invoked by the caller of the executor,
> > such as from ExecutorEnd() or even higher-level callers. This
> > introduces some questionable layering by shifting responsibility for
> > relation management tasks, which ideally belong within the executor,
> > into its callers.
> >
> > My sense is that the complexity involved in carrying pruning results
> > via this parallel data structure was one of the concerns Tom raised
> > previously, alongside the significant pruning code refactoring that
> > the earlier patch required. The latter, at least, should no longer be
> > necessary given recent code improvements.
>
> One point I forgot to mention about this approach is that we'd also
> need to ensure permissions on parent relations are checked before
> performing initial pruning in plancache.c, since pruning may involve
> evaluating user-provided expressions. So in effect, we'd need to
> invoke not just ExecDoInitialPruning(), but also
> ExecCheckPermissions(), or some variant of it, prior to executor
> startup. While manageable, it does add slightly to the complexity.
Sorry for the absence. I've now implemented the approach mentioned
above and split it into a series of reasonably isolated patches.
The key idea is to avoid taking unnecessary locks when reusing a
cached plan. To achieve that, we need to perform initial partition
pruning during cached plan reuse in plancache.c so that only surviving
partitions are locked. This requires some plumbing to reuse the result
of this "early" pruning during executor startup, because repeating the
pruning logic would be both inefficient and potentially inconsistent
-- what if you get different results the second time? (I don't have
proof that this can happen, but some earlier emails mention the
theoretical risk, so better to be safe.)
So this patch introduces ExecutorPrep(), which allows executor
metadata such as initial pruning results (valid subplan indexes) and
full unpruned_relids to be computed ahead of execution and reused
later by ExecutorStart() and during QueryDesc setup in parallel
workers using the results shared by the leader. The parallel query bit
was discussed previously at [1], though I didn’t have a solution I
liked then.
This revives an idea that was last implemented in the patch (v30)
posted on Dec 16, 2022. In retrospect, I understand the hesitation Tom
might have had about the patch at the time -- its changes to enable
early pruning and then feed the results into ExecutorStart() were less
than pretty. Thanks to the initial pruning code refactoring that I
committed in Postgres 18, those changes now seem much more principled
and modular IMO.
The patch set is structured as follows:
* Refactor partition pruning initialization (0001): separates the
setup of the pruning state from its execution by introducing
ExecCreatePartitionPruneStates(). This makes the pruning logic easier
to reuse and adds flexibility to do only the setup but skip pruning in
some cases.
* Introduce ExecutorPrep infrastructure (0002): adds ExecutorPrep()
and ExecPrep as a formal way to perform executor setup ahead of
execution. This enables caching or transferring pruning results and
other metadata without triggering execution. ExecutorStart() can now
consume precomputed prep state from the EState created during
ExecutorPrep(). ExecPrepCleanup() handles cleanup when the plan is
invalidated during prep and so not executed; the state is cleaned up
in the regular ExecutorEnd() path otherwise.
* Allow parallel workers to reuse leader pruning results (0003): lets
workers reuse the leader’s initial pruning results (valid subplan
indexes) and unpruned_relids via ExecutorPrep(). This adds a
verification step to check that leader and worker decisions match,
throwing an error if they don’t -- so "reuse" is a bit of a lie.
Should that check be debug-only? (Maybe not.) As mentioned above, this
was previously discussed at [1].
* Enable pruning-aware locking in cached / generic plan reuse (0004):
extends GetCachedPlan() and CheckCachedPlan() to call ExecutorPrep()
on each PlannedStmt in the CachedPlan, locking only surviving
partitions. Adds CachedPlanPrepData to pass this through plan cache
APIs and down to execution via QueryDesc. Also reinstates the
firstResultRel locking rule added in 28317de72 but later lost due to
revert of the earlier pruning patch, to ensure correctness when all
target partitions are pruned.
This approach keeps plan caching and validation logic self-contained
in plancache.c, avoids invasive executor API changes.
Benchmark results:
echo "plan_cache_mode = force_generic_plan" >> $PGDATA/postgresql.conf
for p in 32 64 128 256 512 1024; do pgbench -i --partitions=$p >
/dev/null 2>&1; echo -ne "$p\t"; pgbench -n -S -T10 -Mprepared | grep
tps; done
Master
32 tps = 23841.822407 (without initial connection time)
64 tps = 21578.619816 (without initial connection time)
128 tps = 18090.500707 (without initial connection time)
256 tps = 14152.248201 (without initial connection time)
512 tps = 9432.708423 (without initial connection time)
1024 tps = 5873.696475 (without initial connection time)
Patched
32 tps = 24724.245798 (without initial connection time)
64 tps = 24858.206407 (without initial connection time)
128 tps = 24652.655269 (without initial connection time)
256 tps = 23656.756615 (without initial connection time)
512 tps = 22299.865769 (without initial connection time)
1024 tps = 21911.704317 (without initial connection time)
Comments welcome.
[1] https://www.postgresql.org/message-id/CA%2BHiwqFA%3DswkzgGK8AmXUNFtLeEXFJwFyY3E7cTxvL46aa1OTw%40mail...
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v1-0003-Reuse-partition-pruning-results-in-parallel-worke.patch (9.0K, 2-v1-0003-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From d23a05d6f412dcbfd38a910331527765999d78e9 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:17:47 +0900
Subject: [PATCH v1 3/4] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce ExecCheckInitialPruningResults() to verify that the results
match what the worker would compute. This check helps catch
inconsistencies across leader and worker pruning logic.
While valuable on its own, this change also lays the foundation for
future optimizations where the leader may take locks only on
surviving partitions. Ensuring that workers follow identical pruning
decisions makes such selective locking safe.
---
src/backend/executor/execParallel.c | 67 +++++++++++++++++++++++++++-
src/backend/executor/execPartition.c | 35 +++++++++++++++
src/include/executor/execPartition.h | 1 +
3 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aedbd9566d6..f16ef184c68 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -65,6 +66,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -608,12 +611,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -642,6 +651,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -668,6 +679,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -769,6 +790,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1263,10 +1294,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ ExecPrep *prep = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1279,9 +1315,38 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, false);
+ Assert(prep->prep_estate);
+
+ prep->prep_estate->es_part_prune_results = part_prune_results;
+ prep->prep_estate->es_unpruned_relids =
+ bms_add_members(prep->prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * Verify that the pruning results passed from the leader match
+ * what the worker would independently compute.
+ */
+ ExecCheckInitialPruningResults(prep->prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
- NULL,
+ prep,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 187a480e508..3b450e3373f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1872,6 +1872,41 @@ ExecDoInitialPruning(EState *estate)
}
}
+/*
+ * ExecCheckInitialPruningResults
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ */
+void
+ExecCheckInitialPruningResults(EState *estate)
+{
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (bms_nonempty_difference(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplns in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+}
+
/*
* ExecInitPartitionExecPruning
* Initialize the data structures needed for runtime "exec" partition
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index ba8cc594fc9..126efd008e5 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -132,6 +132,7 @@ typedef struct PartitionPruneState
extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
+extern void ExecCheckInitialPruningResults(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
--
2.47.3
[application/octet-stream] v1-0001-Refactor-partition-pruning-initialization-for-cla.patch (7.7K, 3-v1-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From 243d407de86b0a73b9bd8c8dbc541f630eb33747 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:18:24 +0900
Subject: [PATCH v1 1/4] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Also simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++-----------
src/include/executor/execPartition.h | 1 +
2 files changed, 43 insertions(+), 28 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index aa12e9ad2ea..88b150c8d77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1772,6 +1771,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1796,6 +1798,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1803,11 +1828,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1825,20 +1850,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -1846,8 +1864,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -1965,14 +1981,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2206,8 +2220,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2219,8 +2233,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3b3f46aced0..ba8cc594fc9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
[application/octet-stream] v1-0004-Use-pruning-aware-locking-in-cached-plans.patch (25.0K, 4-v1-0004-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From ddffccd68513bb0e68d6cf75810cf64cf9a4d757 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:30:52 +0900
Subject: [PATCH v1 4/4] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry ExecutorPrep results
through the plan caching layer. Adjust call sites in SPI,
functions, portals, and EXPLAIN to propagate this data.
This ensures pruning decisions made during initial pruning are
consistently reused without redoing pruning logic in executor paths
like parallel workers. It also lays the groundwork for
pruning-dependent lock behavior during plan reuse.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
---
src/backend/commands/prepare.c | 15 +-
src/backend/executor/functions.c | 14 +-
src/backend/executor/nodeModifyTable.c | 4 +-
src/backend/executor/spi.c | 22 ++-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 3 +
src/backend/tcop/postgres.c | 7 +-
src/backend/utils/cache/plancache.c | 223 ++++++++++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 7 +
src/include/utils/plancache.h | 23 ++-
11 files changed, 299 insertions(+), 23 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index afd449c73ba..10fdff403b9 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,9 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -205,7 +208,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_list,
cplan);
/*
@@ -575,6 +578,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_list;
ListCell *p;
@@ -633,8 +637,11 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -653,7 +660,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/* Explain each query */
i = 0;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 633310c5f5b..8fc22fbd283 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -72,6 +72,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ ExecPrep *prep; /* ExecutorPrep() output for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -657,6 +658,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ int i;
/*
* Clean up after previous query, if there was one.
@@ -695,10 +698,13 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
- NULL);
+ NULL,
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -719,9 +725,12 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ i = 0;
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ ExecPrep *prep = cprep.prep_list ? list_nth(cprep.prep_list, i) :
+ NULL;
execution_state *newes;
/*
@@ -763,6 +772,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep = prep;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1362,7 +1372,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
- NULL,
+ es->prep,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4c5647ac38a..c5812612f8d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4648,8 +4648,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 7a3cb944d6f..72d52baff4b 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1579,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1659,7 +1660,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,7 +1689,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_list, /* XXX - need copy? */
cplan);
/*
@@ -2078,6 +2082,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2101,9 +2106,12 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
error_context_stack = &spierrcontext;
/* Get the generic plan for the query */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cprep);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2501,6 +2509,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ CachedPlanPrepData cprep = {0};
List *prep_list;
int i;
@@ -2577,11 +2586,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c4fd646b999..4c76e78c1da 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -608,6 +608,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ccdc9bc264a..229b39060ae 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1274,6 +1274,9 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
lappend_int(root->glob->resultRelations,
splan->rootRelation);
}
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels,
+ linitial_int(splan->resultRelations));
}
break;
case T_Append:
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d3964a12a14..82972beee70 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1639,6 +1639,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2021,7 +2022,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2034,7 +2037,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_list,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 6661d2c6b73..ebcf601fce7 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,7 +93,7 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
@@ -101,6 +101,8 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -137,6 +139,26 @@ ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
/* GUC parameter */
int plan_cache_mode = PLAN_CACHE_MODE_AUTO;
+/*
+ * Lock acquisition policy for execution locks.
+ *
+ * LOCK_ALL acquires locks on all relations mentioned in the plan,
+ * reproducing the behavior of AcquireExecutorLocks().
+ *
+ * LOCK_UNPRUNED restricts locking to only the unpruned relations. That
+ * includes those mentioned in PlannedStmt.unprunableRelids and the leaf
+ * partitions remaining after performing initial pruning.
+ */
+typedef enum LockPolicy
+{
+ LOCK_ALL,
+ LOCK_UNPRUNED,
+} LockPolicy;
+
+static void AcquireExecutorLocksWithPolicy(List *stmt_list,
+ LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep);
+
/*
* InitPlanCache: initialize module during InitPostgres.
*
@@ -938,7 +960,12 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * PrepAndCheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ *
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The resulting ExecPrep structures are saved in cprep for
+ * later reuse by ExecutorStart().
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
@@ -947,7 +974,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -975,13 +1002,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ LockPolicy policy = !cprep ? LOCK_ALL : LOCK_UNPRUNED;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, true, cprep);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1003,7 +1032,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
}
/*
@@ -1283,6 +1312,10 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function prepares
+ * each PlannedStmt via ExecutorPrep() and stores the results in
+ * cprep->prep_list. These are intended to be passed later to ExecutorStart().
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1293,7 +1326,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1315,7 +1349,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (PrepAndCheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1902,6 +1938,32 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
+/*
+ * AcquireExecutorLocksWithPolicy
+ * Acquire or release execution locks for a plan according to
+ * the specified policy.
+ *
+ * The policy determines whether all relations or only unpruned ones are locked.
+ * For LOCK_UNPRUNED, ExecutorPrep is invoked to identify surviving partitions
+ * and its result is populated in cprep.
+ */
+static void
+AcquireExecutorLocksWithPolicy(List *stmt_list, LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ switch (policy)
+ {
+ case LOCK_ALL:
+ AcquireExecutorLocks(stmt_list, acquire);
+ break;
+ case LOCK_UNPRUNED:
+ AcquireExecutorLocksUnpruned(stmt_list, acquire, cprep);
+ break;
+ default:
+ elog(ERROR, "invalid LockPolicy");
+ }
+}
+
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
@@ -1954,6 +2016,153 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * This function uses ExecutorPrep to identify which partitions survive
+ * initial runtime pruning and locks only those, along with any unprunable
+ * base relations. During acquire, the resulting ExecPrep objects are stored
+ * in cprep->prep_list for later reuse. During release, those same ExecPrep
+ * objects are used to identify what to unlock.
+ *
+ * Unlike AcquireExecutorLocks(), which locks all relations listed in the
+ * PlannedStmt's rtable (LOCK_ALL policy), this function selectively locks
+ * only those rels that may be referenced during execution.
+ *
+ * prep_list is extended during acquire and must match stmt_list during
+ * release. Memory allocation happens in cprep->context.
+ */
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_list;
+ int i;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the ExecPrep list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_list = cprep->prep_list;
+ Assert(acquire || list_length(prep_list) == list_length(stmt_list));
+
+ i = 0;
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecPrep *prep;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_list = lappend(cprep->prep_list, NULL);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ prep = ExecutorPrep(plannedstmt, cprep->params, cprep->owner, true);
+ Assert(prep || plannedstmt->partPruneInfos == NULL);
+ cprep->prep_list = lappend(cprep->prep_list, prep);
+ }
+ else
+ prep = list_nth(prep_list, i++);
+
+ Assert(prep == NULL || prep->prep_estate);
+ if (prep)
+ {
+ EState *prep_estate = prep->prep_estate;
+
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * firstResultRels may contain pruned partitions that must still be
+ * locked to satisfy executor assumptions (see comments in
+ * ExecInitModifyTable(). Ensure they’re included here.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+
+ /* Clean up prep if releasing locks. */
+ if (!acquire)
+ ExecPrepCleanup(prep);
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 30d889b54c5..6fb86dc05f6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -141,6 +141,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..42b51299ece 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -123,6 +123,13 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE
+ */
+ /* integer list of RT indexes, or NIL */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a82b66d4bc2..59f0b0fc4a4 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,26 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * Populated by GetCachedPlan() when ExecutorPrep is run on a generic plan.
+ *
+ * prep_list: results from ExecutorPrep(), one per PlannedStmt
+ * params: parameters that may be used during ExecutorPrep (e.g., pruning)
+ * context: memory context to allocate ExecutorPrep results in
+ * owner: resource owner to associate ExecutorPrep resources with
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_list; /* List of ExecPrep */
+ ParamListInfo params;
+ MemoryContext context;
+ ResourceOwner owner;
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +260,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.47.3
[application/octet-stream] v1-0002-Introduce-ExecutorPrep-infrastructure-for-pre-exe.patch (29.9K, 5-v1-0002-Introduce-ExecutorPrep-infrastructure-for-pre-exe.patch)
download | inline diff:
From e9689618f2889f224eb62e9ff4fb5251285ecdb3 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v1 2/4] Introduce ExecutorPrep infrastructure for
pre-execution setup
Add ExecutorPrep() and ExecPrep to support setting up executor
metadata like range table initialization and partition pruning
ahead of actual execution. This enables execution paths to
perform setup independently of running the plan.
For example, plan validation can compute and consume this
metadata without executing the query. Parallel query workers
can receive pre-initialized state from the leader and pass it
to ExecutorStart, avoiding redundant setup.
ExecutorStart now accepts a prep-estate from QueryDesc to skip
repeating initialization. The ExecPrep wrapper manages cleanup
and signals ownership of the estate. PrepPlan() encapsulates
shared setup logic.
Call sites, including Portal, SPI, and EXPLAIN, are updated to
support passing down the prep data. These changes are mostly
mechanical and clarify the separation between setup and actual
execution.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 11 +-
src/backend/executor/README | 9 +-
src/backend/executor/execMain.c | 192 +++++++++++++++++++++++----
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 3 +
src/backend/executor/functions.c | 1 +
src/backend/executor/spi.c | 10 ++
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 27 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 3 +-
src/include/executor/executor.h | 10 ++
src/include/nodes/execnodes.h | 55 ++++++++
src/include/utils/portal.h | 2 +
21 files changed, 308 insertions(+), 38 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..5efbb0949c2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -870,7 +870,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 1ccc2e55c64..9eabe4920cd 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -334,7 +334,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7e699f8595e..d6ab3697dd9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -370,7 +370,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,7 +492,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -548,7 +549,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, prep, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 93ef1ad106f..3cca6d45ec1 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -993,6 +993,7 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ef7c0d624f1..30cbf9f264f 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -437,7 +437,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index ec96c2efcd3..ac1ddd25aba 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL),
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 34b6410d6a2..afd449c73ba 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,6 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -575,6 +576,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_list;
ListCell *p;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -585,6 +587,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i;
if (es->memory)
{
@@ -650,14 +653,20 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_list = NIL;
/* Explain each query */
+ i = 0;
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecPrep *prep = prep_list ?
+ (ExecPrep *) list_nth(prep_list, i) : NULL;
+ i++;
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..6e481398f18 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,10 +291,17 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ [Optional] ExecutorPrep
+ - May be run before ExecutorStart (e.g., for plan validation).
+ - Performs range table initialization, permission checks, and
+ initial partition pruning.
+ - Returns an ExecPrep wrapper with EState that ExecutorStart may
+ reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
+ CreateExecutorState (or reuse one from ExecPrep if present)
creates per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..1b96b251c34 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -75,6 +75,7 @@ ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
static void InitPlan(QueryDesc *queryDesc, int eflags);
+static void PrepPlan(EState *estate, bool do_initial_pruning);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
static void ExecEndPlan(PlanState *planstate, EState *estate);
@@ -171,8 +172,24 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
*/
- estate = CreateExecutorState();
+ if (queryDesc->prep)
+ {
+ estate = queryDesc->prep->prep_estate;
+
+ /*
+ * Executor is adopting the prep's EState. Mark it so ExecPrepCleanup()
+ * doesn't try to free it redundantly.
+ */
+ queryDesc->prep->owns_estate = false;
+ }
+ else
+ estate = CreateExecutorState();
+
queryDesc->estate = estate;
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -263,6 +280,143 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true.
+ *
+ * This is intended for callers that need executor metadata ahead of actual
+ * execution. Typical use cases include:
+ * - determining which relations must be locked during plan cache validation;
+ * - initializing unpruned relids and valid subplans in parallel workers
+ * using state copied from the leader.
+ *
+ * The executor can reuse the resulting state to avoid redundant setup during
+ * ExecutorStart(); see InitPlan().
+ *
+ * Returns an ExecPrep wrapper that owns the EState and can be reused
+ * or cleaned up later. Returns NULL if no prep is needed (e.g. no pruning).
+ */
+ExecPrep *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ bool do_initial_pruning)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+
+ Assert(pstmt->commandType != CMD_UTILITY);
+
+ /* No pruning needed -- let normal ExecutorStart handle setup later. */
+ if (pstmt->partPruneInfos == NIL)
+ return NULL;
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ PrepPlan(estate, do_initial_pruning);
+
+ CurrentResourceOwner = oldowner;
+
+ return CreateExecPrep(estate, CurrentMemoryContext, NULL, NULL);
+}
+
+/*
+ * PrepPlan: initialize executor metadata needed before plan execution.
+ *
+ * Sets up permissions, range table, and partition pruning infrastructure.
+ * If do_initial_pruning is true, performs initial pruning and stores the
+ * resulting subplan indexes in es_part_prune_results. Otherwise, this step
+ * is skipped, typically when results are provided externally (e.g., in
+ * parallel workers).
+ *
+ * Called from both ExecutorPrep() and InitPlan().
+ */
+static void
+PrepPlan(EState *estate, bool do_initial_pruning)
+{
+ PlannedStmt *pstmt = estate->es_plannedstmt;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to compute
+ * the subset of child subplans that will be executed. The results,
+ * which are bitmapsets of selected child indexes, are saved in
+ * es_part_prune_results. This list is parallel to es_part_prune_infos.
+ *
+ * In parallel workers, do_initial_pruning should be false — they receive
+ * es_part_prune_results from the leader process and should only initialize
+ * the PartitionPruneStates.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
+}
+
+/*
+ * CreateExecPrep: initialize ExecPrep wrapper with optional cleanup metadata.
+ */
+ExecPrep *
+CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg)
+{
+ ExecPrep *prep = palloc0(sizeof(ExecPrep));
+
+ prep->prep_estate = estate;
+ prep->context = context;
+ prep->cleanup = cleanup;
+ prep->cleanup_arg = cleanup_arg;
+ prep->owns_estate = true;
+
+ return prep;
+}
+
+/*
+ * ExecPrepCleanup: free ExecPrep resources not adopted by the executor.
+ *
+ * Only frees the EState if it wasn't taken over by ExecutorStart().
+ * Always runs the optional user-defined cleanup callback.
+ */
+void
+ExecPrepCleanup(ExecPrep *prep)
+{
+ if (prep == NULL)
+ return;
+
+ if (prep->prep_estate && prep->owns_estate)
+ {
+ ExecCloseRangeTableRelations(prep->prep_estate);
+ FreeExecutorState(prep->prep_estate);
+ }
+
+ if (prep->cleanup)
+ prep->cleanup(prep->cleanup_arg);
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -824,7 +978,6 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
@@ -838,7 +991,6 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
@@ -846,29 +998,19 @@ InitPlan(QueryDesc *queryDesc, int eflags)
int i;
/*
- * Do permissions checks
+ * If ExecutorPrep() was not run earlier (e.g., during plan validation),
+ * perform InitPlan setup: init range table, check permissions, and run
+ * initial pruning. Otherwise, the executor will reuse the same information
+ * in queryDesc->prep->prep_estate.
*/
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ if (queryDesc->prep == NULL)
+ {
+ estate->es_plannedstmt = plannedstmt;
+ estate->es_part_prune_infos = plannedstmt->partPruneInfos;
+ PrepPlan(estate, true);
+ }
+ else
+ Assert(estate == queryDesc->prep->prep_estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f098a5557cf..aedbd9566d6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1281,6 +1281,7 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 88b150c8d77..187a480e508 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2368,6 +2368,9 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /* Wouldn't be available at ExecutorPrep() time. */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 630d708d2a3..633310c5f5b 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1362,6 +1362,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 653500b38dc..7a3cb944d6f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1685,6 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2500,6 +2501,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_list;
+ int i;
spicallbackarg.query = plansource->query_string;
@@ -2578,6 +2581,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_list = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2615,12 +2619,17 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ i = 0;
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecPrep *prep = prep_list ?
+ list_nth(prep_list, i) : NULL;
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
+ i++;
+
/*
* Reset output state. (Note that if a non-SPI receiver is used,
* _SPI_current->processed will stay zero, and that's what we'll
@@ -2690,6 +2699,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ prep,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2bd89102686..d3964a12a14 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1232,6 +1232,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2033,6 +2034,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index fde78c55160..82c295502b0 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -66,6 +67,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -78,6 +80,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->prep = prep; /* executor prep output */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -112,6 +115,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /* ExecPrep cleanup if necessary */
+ if (qdesc->prep)
+ {
+ ExecPrepCleanup(qdesc->prep);
+ qdesc->prep = NULL;
+ }
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -123,6 +133,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep: ExecPrep for the plan (output of ExecutorPrep())
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +146,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -146,7 +158,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, prep, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -489,6 +501,9 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->preps ?
+ (ExecPrep *) linitial(portal->preps) :
+ NULL,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1185,6 +1200,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1221,14 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecPrep *prep = portal->preps ?
+ list_nth(portal->preps, i) : NULL;
+
+ i++;
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1286,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1295,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 943da087c9f..313f8ef2fdc 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -298,6 +299,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->preps = preps;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 6e51d50efc7..6aa8b275aa2 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,7 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..c18530f5d11 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -18,7 +18,6 @@
#include "nodes/execnodes.h"
#include "tcop/dest.h"
-
/* ----------------
* query descriptor:
*
@@ -35,6 +34,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecPrep *prep; /* output of ExecutorPrep() or NULL */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +57,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fa2b657fb2f..bc90d0ea7ee 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -20,6 +20,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -234,6 +235,15 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern ExecPrep *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ bool do_initial_pruning);
+extern ExecPrep *CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg);
+extern void ExecPrepCleanup(ExecPrep *prep);
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..f569be3853f 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -772,6 +772,61 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
+/*
+ * ExecPrep: encapsulates executor preparation results for a PlannedStmt.
+ *
+ * This is used when we want to perform executor setup steps -- such as
+ * initializing the range table, checking permissions, and executing initial
+ * partition pruning -- ahead of actual plan execution. A typical use case is
+ * in plan validation logic (e.g., when deciding whether to reuse a generic
+ * cached plan), where we need to determine exactly which partitions will be
+ * scanned and locked, without executing the full plan.
+ *
+ * The executor may later adopt the prepared EState (via ExecutorStart),
+ * avoiding redundant setup. In that case, the executor is responsible for
+ * freeing the state and ExecPrepCleanup() will skip it.
+ */
+struct ExecPrep;
+
+/*
+ * Optional callback to clean up user-specific resources associated with
+ * ExecPrep.
+ */
+typedef void (*execprep_cleanup_fn)(struct ExecPrep *prep);
+
+/* ExecutorPrep output */
+typedef struct ExecPrep
+{
+ /*
+ * Context in which this struct and all subsidiary allocations were made.
+ * This context must remain alive until ExecPrepCleanup is called.
+ */
+ MemoryContext context;
+
+ /*
+ * Partially-initialized executor state used for permission checks and
+ * pruning. May be adopted directly by ExecutorStart(), in which case
+ * ExecPrepCleanup will skip freeing it.
+ */
+ EState *prep_estate;
+
+ /*
+ * True if ExecPrepCleanup() must free the EState. If the executor adopts
+ * prep_estate, this is set to false to avoid double-free.
+ */
+ bool owns_estate;
+
+ /*
+ * Optional caller-supplied cleanup hook to run during ExecPrepCleanup.
+ * Useful for releasing external resources associated with the prep.
+ */
+ execprep_cleanup_fn cleanup;
+
+ /*
+ * Opaque pointer to pass to the cleanup hook.
+ */
+ void *cleanup_arg;
+} ExecPrep;
/*
* ExecRowMark -
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 5ffa6fd5cc8..013bcc3bd8e 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *preps; /* list of ExecPreps where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-11-17 12:50 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2025-11-17 12:50 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Wed, Nov 12, 2025 at 11:17 PM Amit Langote <[email protected]> wrote:
> The key idea is to avoid taking unnecessary locks when reusing a
> cached plan. To achieve that, we need to perform initial partition
> pruning during cached plan reuse in plancache.c so that only surviving
> partitions are locked. This requires some plumbing to reuse the result
> of this "early" pruning during executor startup, because repeating the
> pruning logic would be both inefficient and potentially inconsistent
> -- what if you get different results the second time? (I don't have
> proof that this can happen, but some earlier emails mention the
> theoretical risk, so better to be safe.)
>
> So this patch introduces ExecutorPrep(), which allows executor
> metadata such as initial pruning results (valid subplan indexes) and
> full unpruned_relids to be computed ahead of execution and reused
> later by ExecutorStart() and during QueryDesc setup in parallel
> workers using the results shared by the leader. The parallel query bit
> was discussed previously at [1], though I didn’t have a solution I
> liked then.
>
...
> The patch set is structured as follows:
>
> * Refactor partition pruning initialization (0001): separates the
> setup of the pruning state from its execution by introducing
> ExecCreatePartitionPruneStates(). This makes the pruning logic easier
> to reuse and adds flexibility to do only the setup but skip pruning in
> some cases.
>
> * Introduce ExecutorPrep infrastructure (0002): adds ExecutorPrep()
> and ExecPrep as a formal way to perform executor setup ahead of
> execution. This enables caching or transferring pruning results and
> other metadata without triggering execution. ExecutorStart() can now
> consume precomputed prep state from the EState created during
> ExecutorPrep(). ExecPrepCleanup() handles cleanup when the plan is
> invalidated during prep and so not executed; the state is cleaned up
> in the regular ExecutorEnd() path otherwise.
In v1 patch, I had not made ExecutorStart() call ExecutorPrep() to do
the prep work (creating EState, setting up es_relations, checking
permissions) when QueryDesc did not carry the results of
ExecutorPrep() from some earlier stage. Instead, InitPlan() would
detect that prep was absent and perform the missing setup itself. On
second thought it is cleaner for ExecutorStart() to detect the absence
of prep and call ExecutorPrep() directly, matching how prep would be
created when coming from plancache et al.
v2 changes the patch to do that.
> * Enable pruning-aware locking in cached / generic plan reuse (0004):
> extends GetCachedPlan() and CheckCachedPlan() to call ExecutorPrep()
> on each PlannedStmt in the CachedPlan, locking only surviving
> partitions. Adds CachedPlanPrepData to pass this through plan cache
> APIs and down to execution via QueryDesc. Also reinstates the
> firstResultRel locking rule added in 28317de72 but later lost due to
> revert of the earlier pruning patch, to ensure correctness when all
> target partitions are pruned.
Looking at the changes to executor/function.c, I also noticed that I
had mistakenly allocated the ExecutorPrep state in
SQLFunctionCache.fcontext whereas the correct context for execution
related state is SQLFunctionCache.subcontext. In the updated patch,
I've made postquel_start() reparent the prep EState's es_query_cxt to
subcontext from fcontext. I also did not have a test case that
exercised cached plan reuse for SQL functions, so I added one. I split
the function.c's GetCachedPlan() + CachedPlanPrepData plumbing into a
new patch 0005 so it can be reviewed separately, since it is the only
non-mechanical call-site change.
> Benchmark results:
>
> echo "plan_cache_mode = force_generic_plan" >> $PGDATA/postgresql.conf
> for p in 32 64 128 256 512 1024; do pgbench -i --partitions=$p >
> /dev/null 2>&1; echo -ne "$p\t"; pgbench -n -S -T10 -Mprepared | grep
> tps; done
>
> Master
>
> 32 tps = 23841.822407 (without initial connection time)
> 64 tps = 21578.619816 (without initial connection time)
> 128 tps = 18090.500707 (without initial connection time)
> 256 tps = 14152.248201 (without initial connection time)
> 512 tps = 9432.708423 (without initial connection time)
> 1024 tps = 5873.696475 (without initial connection time)
>
> Patched
>
> 32 tps = 24724.245798 (without initial connection time)
> 64 tps = 24858.206407 (without initial connection time)
> 128 tps = 24652.655269 (without initial connection time)
> 256 tps = 23656.756615 (without initial connection time)
> 512 tps = 22299.865769 (without initial connection time)
> 1024 tps = 21911.704317 (without initial connection time)
Re-ran to include 0 partition case and more partitions than 1024:
echo "plan_cache_mode = force_generic_plan" >> $PGDATA/postgresql.conf
for p in 0 8 16 32 64 128 256 512 1024 2048 4096; do pgbench -i
--partitions=$p > /dev/null 2>&1; echo -ne "$p\t"; pgbench -n -S -T10
-Mprepared | grep tps; done
Master
0 tps = 23600.068719 (without initial connection time)
8 tps = 22548.439906 (without initial connection time)
16 tps = 22807.337363 (without initial connection time)
32 tps = 22837.789996 (without initial connection time)
64 tps = 22915.846820 (without initial connection time)
128 tps = 22958.472655 (without initial connection time)
256 tps = 22432.432730 (without initial connection time)
512 tps = 20327.618690 (without initial connection time)
1024 tps = 20554.932475 (without initial connection time)
2048 tps = 19947.061061 (without initial connection time)
4096 tps = 17294.369829 (without initial connection time)
Patched
0 tps = 23869.906654 (without initial connection time)
8 tps = 22682.498914 (without initial connection time)
16 tps = 22714.445711 (without initial connection time)
32 tps = 21653.589371 (without initial connection time)
64 tps = 20571.267545 (without initial connection time)
128 tps = 17138.088269 (without initial connection time)
256 tps = 13027.168426 (without initial connection time)
512 tps = 8689.486966 (without initial connection time)
1024 tps = 5450.525617 (without initial connection time)
2048 tps = 3034.383108 (without initial connection time)
4096 tps = 1560.110609 (without initial connection time)
Tabular format (+ve pct_change means patched better)
partitions master patched pct_change
----------------------------------------------------
0 23869.91 23600.07 -1.1%
8 22682.50 22548.44 -0.6%
16 22714.45 22807.34 +0.4%
32 21653.59 22837.79 +5.5%
64 20571.27 22915.85 +11.4%
128 17138.09 22958.47 +34.0%
256 13027.17 22432.43 +72.2%
512 8689.49 20327.62 +133.9%
1024 5450.53 20554.93 +277.1%
2048 3034.38 19947.06 +557.4%
4096 1560.11 17294.37 +1008.5%
I also did some runs for custom plans. The custom plan path should
behave about the same on master and patched since the early
ExecutorPrep() business only applies to generic plan reuse cases.
echo "plan_cache_mode = force_custom_plan" >> $PGDATA/postgresql.conf
for p in 0 8 16 32 64 128 256 512 1024 2048 4096; do pgbench -i
--partitions=$p > /dev/null 2>&1; echo -ne "$p\t"; pgbench -n -S -T10
-Mprepared | grep tps; done
Master
pgbench -n -S -T10 -Mprepared | grep tps; done
0 tps = 22346.419557 (without initial connection time)
8 tps = 20959.115560 (without initial connection time)
16 tps = 21390.573290 (without initial connection time)
32 tps = 21358.292393 (without initial connection time)
64 tps = 21288.742635 (without initial connection time)
128 tps = 21167.721447 (without initial connection time)
256 tps = 21256.618661 (without initial connection time)
512 tps = 19401.261197 (without initial connection time)
1024 tps = 19169.135145 (without initial connection time)
2048 tps = 19504.102179 (without initial connection time)
4096 tps = 18880.855783 (without initial connection time)
Patched
0 tps = 22852.634752 (without initial connection time)
8 tps = 21596.432690 (without initial connection time)
16 tps = 21428.779996 (without initial connection time)
32 tps = 20629.225272 (without initial connection time)
64 tps = 21301.644733 (without initial connection time)
128 tps = 21098.543942 (without initial connection time)
256 tps = 21394.364662 (without initial connection time)
512 tps = 19475.152170 (without initial connection time)
1024 tps = 19585.768438 (without initial connection time)
2048 tps = 19810.211969 (without initial connection time)
4096 tps = 19160.981608 (without initial connection time)
In tabular format:
partitions master patched pct_change
----------------------------------------------------
0 22346.42 22852.63 +2.3%
8 20959.12 21596.43 +3.0%
16 21390.57 21428.78 +0.2%
32 21358.29 20629.23 -3.4%
64 21288.74 21301.64 +0.1%
128 21167.72 21098.54 -0.3%
256 21256.62 21394.36 +0.6%
512 19401.26 19475.15 +0.4%
1024 19169.14 19585.77 +2.2%
2048 19504.10 19810.21 +1.6%
4096 18880.86 19160.98 +1.5%
Numbers look within noise range as expected.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v2-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (6.5K, 2-v2-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From eef8d1af46ca8deefbf8eb95428d37fc900a0944 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 17 Nov 2025 17:40:26 +0900
Subject: [PATCH v2 5/5] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users, which a later
patch will use to support pruning aware locking.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 33 +++++++++++++++++++++++--
src/test/regress/expected/plancache.out | 31 +++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 29 ++++++++++++++++++++++
3 files changed, 91 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 633310c5f5b..ed7352fce61 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -72,6 +72,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ ExecPrep *prep; /* ExecutorPrep() output for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -657,6 +658,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ int i;
/*
* Clean up after previous query, if there was one.
@@ -695,10 +698,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
- NULL);
+ NULL,
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -719,9 +732,12 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ i = 0;
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ ExecPrep *prep = cprep.prep_list ? list_nth(cprep.prep_list, i++) :
+ NULL;
execution_state *newes;
/*
@@ -763,6 +779,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep = prep;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1361,8 +1378,20 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ if (es->prep)
+ {
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ EState *prep_estate = es->prep->prep_estate;
+
+ MemoryContextSetParent(prep_estate->es_query_cxt, fcache->subcontext);
+ }
+
es->qd = CreateQueryDesc(es->stmt,
- NULL,
+ es->prep,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..8c68691df91 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,34 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..56ebbbdecd2 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,32 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v2-0001-Refactor-partition-pruning-initialization-for-cla.patch (7.7K, 3-v2-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From 243d407de86b0a73b9bd8c8dbc541f630eb33747 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:18:24 +0900
Subject: [PATCH v2 1/5] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Also simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++-----------
src/include/executor/execPartition.h | 1 +
2 files changed, 43 insertions(+), 28 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index aa12e9ad2ea..88b150c8d77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1772,6 +1771,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1796,6 +1798,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1803,11 +1828,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1825,20 +1850,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -1846,8 +1864,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -1965,14 +1981,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2206,8 +2220,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2219,8 +2233,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3b3f46aced0..ba8cc594fc9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
[application/octet-stream] v2-0004-Use-pruning-aware-locking-in-cached-plans.patch (24.0K, 4-v2-0004-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From 74dc075dc8f844e036fc38e005fc512b6dd54bc9 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:30:52 +0900
Subject: [PATCH v2 4/5] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry ExecutorPrep results
through the plan caching layer. Adjust call sites in SPI,
functions, portals, and EXPLAIN to propagate this data.
This ensures pruning decisions made during initial pruning are
consistently reused without redoing pruning logic in executor paths
like parallel workers. It also lays the groundwork for
pruning-dependent lock behavior during plan reuse.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
---
src/backend/commands/prepare.c | 19 +-
src/backend/executor/nodeModifyTable.c | 4 +-
src/backend/executor/spi.c | 26 ++-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 3 +
src/backend/tcop/postgres.c | 9 +-
src/backend/utils/cache/plancache.c | 234 ++++++++++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 ++
src/include/utils/plancache.h | 24 ++-
10 files changed, 312 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index afd449c73ba..23332d19b37 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ /* Keep ExecutorPrep state with the portal and its resowner. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -205,7 +209,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_list,
cplan);
/*
@@ -575,6 +579,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_list;
ListCell *p;
@@ -633,8 +638,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ /* ExecutorPrep state is local to this EXPLAIN EXECUTE call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -653,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/* Explain each query */
i = 0;
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4c5647ac38a..c5812612f8d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4648,8 +4648,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 7a3cb944d6f..d580f1e0425 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1579,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1659,7 +1660,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ /* ExecutorPrep state lives in this portal's context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,7 +1690,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_list, /* lives in portalContext */
cplan);
/*
@@ -2078,6 +2083,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2101,9 +2107,13 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
error_context_stack = &spierrcontext;
/* Get the generic plan for the query */
+ /* ExecutorPrep() state lives in caller's active context. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cprep);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2501,6 +2511,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ CachedPlanPrepData cprep = {0};
List *prep_list;
int i;
@@ -2577,11 +2588,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+
+ /* ExecutorPrep state is per _SPI_execute_plan call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c4fd646b999..4c76e78c1da 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -608,6 +608,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ccdc9bc264a..229b39060ae 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1274,6 +1274,9 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
lappend_int(root->glob->resultRelations,
splan->rootRelation);
}
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels,
+ linitial_int(splan->resultRelations));
}
break;
case T_Append:
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d3964a12a14..249829f59a0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1639,6 +1639,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2021,7 +2022,11 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+
+ /* ExecutorPrep() state lives in portal context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2034,7 +2039,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_list,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 6661d2c6b73..c1cfd47422c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,7 +93,7 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
@@ -101,6 +101,8 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -137,6 +139,26 @@ ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
/* GUC parameter */
int plan_cache_mode = PLAN_CACHE_MODE_AUTO;
+/*
+ * Lock acquisition policy for execution locks.
+ *
+ * LOCK_ALL acquires locks on all relations mentioned in the plan,
+ * reproducing the behavior of AcquireExecutorLocks().
+ *
+ * LOCK_UNPRUNED restricts locking to only the unpruned relations. That
+ * includes those mentioned in PlannedStmt.unprunableRelids and the leaf
+ * partitions remaining after performing initial pruning.
+ */
+typedef enum LockPolicy
+{
+ LOCK_ALL,
+ LOCK_UNPRUNED,
+} LockPolicy;
+
+static void AcquireExecutorLocksWithPolicy(List *stmt_list,
+ LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep);
+
/*
* InitPlanCache: initialize module during InitPostgres.
*
@@ -938,7 +960,12 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * PrepAndCheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ *
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The resulting ExecPrep structures are saved in cprep for
+ * later reuse by ExecutorStart().
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
@@ -947,7 +974,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -975,13 +1002,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ LockPolicy policy = !cprep ? LOCK_ALL : LOCK_UNPRUNED;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, true, cprep);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1003,7 +1032,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
}
/*
@@ -1283,6 +1312,10 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function prepares
+ * each PlannedStmt via ExecutorPrep() and stores the results in
+ * cprep->prep_list. These are intended to be passed later to ExecutorStart().
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1293,7 +1326,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1315,7 +1349,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (PrepAndCheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1902,6 +1938,38 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
+/*
+ * AcquireExecutorLocksWithPolicy
+ * Acquire or release execution locks for a cached plan according to
+ * the specified policy.
+ *
+ * LOCK_ALL reproduces AcquireExecutorLocks(), locking every relation in
+ * each PlannedStmt's rtable. LOCK_UNPRUNED restricts locking to the
+ * unprunable rels and partitions that survive initial runtime pruning.
+ *
+ * When LOCK_UNPRUNED is used on acquire, ExecutorPrep() is invoked for
+ * each PlannedStmt and the resulting ExecPrep pointers are appended to
+ * cprep->prep_list in cprep->context. On release, the same ExecPrep
+ * list is consulted to determine which relations to unlock and is then
+ * cleaned up with ExecPrepCleanup().
+ */
+static void
+AcquireExecutorLocksWithPolicy(List *stmt_list, LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ switch (policy)
+ {
+ case LOCK_ALL:
+ AcquireExecutorLocks(stmt_list, acquire);
+ break;
+ case LOCK_UNPRUNED:
+ AcquireExecutorLocksUnpruned(stmt_list, acquire, cprep);
+ break;
+ default:
+ elog(ERROR, "invalid LockPolicy");
+ }
+}
+
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
@@ -1954,6 +2022,158 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the ExecPrep pointer for each PlannedStmt to cprep->prep_list
+ *
+ * On release, it:
+ * - looks up the ExecPrep object for each PlannedStmt from cprep->prep_list
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ * - calls ExecPrepCleanup() on each ExecPrep
+ *
+ * prep_list is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for ExecPrep happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_list;
+ int i;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the ExecPrep list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_list = cprep->prep_list;
+ Assert(acquire || list_length(prep_list) == list_length(stmt_list));
+
+ i = 0;
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecPrep *prep;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_list = lappend(cprep->prep_list, NULL);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ prep = ExecutorPrep(plannedstmt, cprep->params, cprep->owner, true,
+ cprep->eflags);
+ Assert(prep || plannedstmt->partPruneInfos == NULL);
+ cprep->prep_list = lappend(cprep->prep_list, prep);
+ }
+ else
+ prep = list_nth(prep_list, i++);
+
+ Assert(prep == NULL || prep->prep_estate);
+ if (prep)
+ {
+ EState *prep_estate = prep->prep_estate;
+
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * firstResultRels may contain pruned partitions that must still be
+ * locked to satisfy executor assumptions (see comments in
+ * ExecInitModifyTable(). Ensure they’re included here.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+
+ /* Clean up prep if releasing locks. */
+ if (!acquire)
+ ExecPrepCleanup(prep);
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 30d889b54c5..6fb86dc05f6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -141,6 +141,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..eb211f1ba56 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -123,6 +123,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a82b66d4bc2..c7b8ec4be39 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,27 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_list is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. The
+ * same list is later used to determine which relations to unlock when
+ * releasing execution locks.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_list; /* one ExecPrep per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate ExecPrep objects */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to pass to ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +261,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.47.3
[application/octet-stream] v2-0003-Reuse-partition-pruning-results-in-parallel-worke.patch (9.1K, 5-v2-0003-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From d9d95e09961dcb8236e5fe7b2da4a37fda8e5944 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:17:47 +0900
Subject: [PATCH v2 3/5] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce ExecCheckInitialPruningResults() to verify that the results
match what the worker would compute. This check helps catch
inconsistencies across leader and worker pruning logic.
While valuable on its own, this change also lays the foundation for
future optimizations where the leader may take locks only on
surviving partitions. Ensuring that workers follow identical pruning
decisions makes such selective locking safe.
---
src/backend/executor/execParallel.c | 67 +++++++++++++++++++++++++++-
src/backend/executor/execPartition.c | 35 +++++++++++++++
src/include/executor/execPartition.h | 1 +
3 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aedbd9566d6..751590adcc9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -65,6 +66,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -608,12 +611,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -642,6 +651,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -668,6 +679,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -769,6 +790,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1263,10 +1294,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ ExecPrep *prep = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1279,9 +1315,38 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, false, 0);
+ Assert(prep->prep_estate);
+
+ prep->prep_estate->es_part_prune_results = part_prune_results;
+ prep->prep_estate->es_unpruned_relids =
+ bms_add_members(prep->prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * Verify that the pruning results passed from the leader match
+ * what the worker would independently compute.
+ */
+ ExecCheckInitialPruningResults(prep->prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
- NULL,
+ prep,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 187a480e508..3b450e3373f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1872,6 +1872,41 @@ ExecDoInitialPruning(EState *estate)
}
}
+/*
+ * ExecCheckInitialPruningResults
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ */
+void
+ExecCheckInitialPruningResults(EState *estate)
+{
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (bms_nonempty_difference(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplns in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+}
+
/*
* ExecInitPartitionExecPruning
* Initialize the data structures needed for runtime "exec" partition
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index ba8cc594fc9..126efd008e5 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -132,6 +132,7 @@ typedef struct PartitionPruneState
extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
+extern void ExecCheckInitialPruningResults(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
--
2.47.3
[application/octet-stream] v2-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (28.7K, 6-v2-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From 11e0262e31e35539f50e96531559db6cd7e32160 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v2 2/5] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorPrep builds an EState containing the executor
metadata needed before plan execution, including partition
pruning state where partPruneInfos are present.
ExecutorStart() now expects QueryDesc->prep to point at such an
ExecPrep object. If no prep was supplied by the caller, it
invokes ExecutorPrep() itself and adopts the resulting EState
for the duration of the query. This keeps the executor startup
behaviour unchanged while making the setup work callable
separately when needed.
CreateQueryDesc() grows a prep argument and stores it in the
QueryDesc. Portals, SPI, SQL functions, and EXPLAIN are wired
to carry an optional ExecPrep pointer alongside the PlannedStmt
list, but most callers still pass NULL and let ExecutorStart()
perform the setup lazily.
Add the ExecPrep struct and ExecPrepCleanup() to encapsulate
ownership of the prepared EState and any caller specific
cleanup hook. Update executor/README and related comments to
document the new control flow and the separation between
preparation and execution.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 11 +-
src/backend/executor/README | 8 +-
src/backend/executor/execMain.c | 179 +++++++++++++++++++++++----
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 3 +
src/backend/executor/functions.c | 1 +
src/backend/executor/spi.c | 10 ++
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 27 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 3 +-
src/include/executor/executor.h | 11 ++
src/include/nodes/execnodes.h | 48 +++++++
src/include/utils/portal.h | 2 +
21 files changed, 286 insertions(+), 40 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..5efbb0949c2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -870,7 +870,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 1ccc2e55c64..9eabe4920cd 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -334,7 +334,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7e699f8595e..d6ab3697dd9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -370,7 +370,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,7 +492,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -548,7 +549,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, prep, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 93ef1ad106f..3cca6d45ec1 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -993,6 +993,7 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ef7c0d624f1..30cbf9f264f 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -437,7 +437,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index ec96c2efcd3..ac1ddd25aba 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL),
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 34b6410d6a2..afd449c73ba 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,6 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -575,6 +576,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_list;
ListCell *p;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -585,6 +587,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i;
if (es->memory)
{
@@ -650,14 +653,20 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_list = NIL;
/* Explain each query */
+ i = 0;
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecPrep *prep = prep_list ?
+ (ExecPrep *) list_nth(prep_list, i) : NULL;
+ i++;
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..95b5ec58c55 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,10 +291,16 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Performs range
+ table initialization, permission checks, and initial partition pruning.
+ Returns an ExecPrep wrapper with EState that ExecutorStart may reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
+ CreateExecutorState (or reuse one from ExecPrep if present)
creates per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..39de0b93a1c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -171,8 +171,26 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
*/
- estate = CreateExecutorState();
+ if (queryDesc->prep == NULL)
+ queryDesc->prep = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ true,
+ eflags);
+ Assert(queryDesc->prep);
+ estate = queryDesc->prep->prep_estate;
+
+ /*
+ * Executor is adopting the prep's EState. Mark it so ExecPrepCleanup()
+ * doesn't try to free it redundantly.
+ */
+ queryDesc->prep->owns_estate = false;
+
queryDesc->estate = estate;
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -263,6 +281,136 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true.
+ *
+ * This is intended for callers that need executor metadata ahead of actual
+ * execution. Typical use cases include:
+ * - determining which relations must be locked during plan cache validation;
+ * - initializing unpruned relids and valid subplans in parallel workers
+ * using state copied from the leader.
+ *
+ * The executor can reuse the resulting state to avoid redundant setup during
+ * ExecutorStart().
+ *
+ * Returns an ExecPrep wrapper that owns the EState and can be reused
+ * or cleaned up later.
+ */
+ExecPrep *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ bool do_initial_pruning, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+ bool snapshot_set;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Pruning may use expressions that require an active snapshot. */
+ snapshot_set = false;
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snapshot_set = true;
+ }
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to compute
+ * the subset of child subplans that will be executed. The results,
+ * which are bitmapsets of selected child indexes, are saved in
+ * es_part_prune_results. This list is parallel to es_part_prune_infos.
+ *
+ * In parallel workers, do_initial_pruning should be false -- they receive
+ * es_part_prune_results from the leader process and should only initialize
+ * the PartitionPruneStates.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ /* Release snapshot if we got one */
+ if (snapshot_set)
+ PopActiveSnapshot();
+
+ return CreateExecPrep(estate, CurrentMemoryContext, NULL, NULL);
+}
+
+/*
+ * CreateExecPrep: initialize ExecPrep wrapper with optional cleanup metadata.
+ */
+ExecPrep *
+CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg)
+{
+ ExecPrep *prep = palloc0(sizeof(ExecPrep));
+
+ prep->prep_estate = estate;
+ prep->context = context;
+ prep->cleanup = cleanup;
+ prep->cleanup_arg = cleanup_arg;
+ prep->owns_estate = true;
+
+ return prep;
+}
+
+/*
+ * ExecPrepCleanup: free ExecPrep resources not adopted by the executor.
+ *
+ * Only frees the EState if it wasn't taken over by ExecutorStart().
+ * Always runs the optional user-defined cleanup callback.
+ */
+void
+ExecPrepCleanup(ExecPrep *prep)
+{
+ if (prep == NULL)
+ return;
+
+ if (prep->prep_estate && prep->owns_estate)
+ {
+ ExecCloseRangeTableRelations(prep->prep_estate);
+ FreeExecutorState(prep->prep_estate);
+ }
+
+ if (prep->cleanup)
+ prep->cleanup(prep->cleanup_arg);
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -824,7 +972,6 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
@@ -838,37 +985,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->prep);
+ Assert(estate == queryDesc->prep->prep_estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f098a5557cf..aedbd9566d6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1281,6 +1281,7 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 88b150c8d77..187a480e508 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2368,6 +2368,9 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /* Wouldn't be available at ExecutorPrep() time. */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 630d708d2a3..633310c5f5b 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1362,6 +1362,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 653500b38dc..7a3cb944d6f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1685,6 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2500,6 +2501,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_list;
+ int i;
spicallbackarg.query = plansource->query_string;
@@ -2578,6 +2581,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_list = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2615,12 +2619,17 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ i = 0;
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecPrep *prep = prep_list ?
+ list_nth(prep_list, i) : NULL;
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
+ i++;
+
/*
* Reset output state. (Note that if a non-SPI receiver is used,
* _SPI_current->processed will stay zero, and that's what we'll
@@ -2690,6 +2699,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ prep,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2bd89102686..d3964a12a14 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1232,6 +1232,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2033,6 +2034,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index fde78c55160..82c295502b0 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -66,6 +67,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -78,6 +80,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->prep = prep; /* executor prep output */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -112,6 +115,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /* ExecPrep cleanup if necessary */
+ if (qdesc->prep)
+ {
+ ExecPrepCleanup(qdesc->prep);
+ qdesc->prep = NULL;
+ }
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -123,6 +133,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep: ExecPrep for the plan (output of ExecutorPrep())
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +146,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -146,7 +158,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, prep, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -489,6 +501,9 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->preps ?
+ (ExecPrep *) linitial(portal->preps) :
+ NULL,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1185,6 +1200,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1221,14 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecPrep *prep = portal->preps ?
+ list_nth(portal->preps, i) : NULL;
+
+ i++;
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1286,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1295,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 943da087c9f..313f8ef2fdc 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -298,6 +299,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->preps = preps;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 6e51d50efc7..6aa8b275aa2 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,7 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..c18530f5d11 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -18,7 +18,6 @@
#include "nodes/execnodes.h"
#include "tcop/dest.h"
-
/* ----------------
* query descriptor:
*
@@ -35,6 +34,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecPrep *prep; /* output of ExecutorPrep() or NULL */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +57,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fa2b657fb2f..3579926d4e8 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -20,6 +20,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -234,6 +235,16 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern ExecPrep *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ bool do_initial_pruning,
+ int eflags);
+extern ExecPrep *CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg);
+extern void ExecPrepCleanup(ExecPrep *prep);
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..8bdecd631bf 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -772,6 +772,54 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
+/*
+ * ExecPrep: encapsulates executor preparation results for a PlannedStmt.
+ *
+ * ExecutorPrep() factors out executor setup steps such as initializing the
+ * range table, checking permissions, and executing initial partition pruning.
+ * ExecutorStart() can reuse the prepared EState instead of repeating that
+ * work, and other callers (such as plan cache validation) can use it without
+ * running the full plan.
+ */
+
+/*
+ * Optional callback to clean up user-specific resources associated with
+ * ExecPrep.
+ */
+typedef void (*execprep_cleanup_fn)(void *prep);
+
+typedef struct ExecPrep
+{
+ /*
+ * Context in which this struct and all subsidiary allocations were made.
+ * This context must remain alive until ExecPrepCleanup is called.
+ */
+ MemoryContext context;
+
+ /*
+ * Partially-initialized executor state used for permission checks and
+ * pruning. May be adopted directly by ExecutorStart(), in which case
+ * ExecPrepCleanup will skip freeing it.
+ */
+ EState *prep_estate;
+
+ /*
+ * True if ExecPrepCleanup() must free the EState. If the executor adopts
+ * prep_estate, this is set to false to avoid double-free.
+ */
+ bool owns_estate;
+
+ /*
+ * Optional caller-supplied cleanup hook to run during ExecPrepCleanup.
+ * Useful for releasing external resources associated with the prep.
+ */
+ execprep_cleanup_fn cleanup;
+
+ /*
+ * Opaque pointer to pass to the cleanup hook.
+ */
+ void *cleanup_arg;
+} ExecPrep;
/*
* ExecRowMark -
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 5ffa6fd5cc8..013bcc3bd8e 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *preps; /* list of ExecPreps where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-11-20 07:30 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 2 replies; 82+ messages in thread
From: Amit Langote @ 2025-11-20 07:30 UTC (permalink / raw)
To: Tom Lane <[email protected]>; +Cc: Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Mon, Nov 17, 2025 at 9:50 PM Amit Langote <[email protected]> wrote:
> On Wed, Nov 12, 2025 at 11:17 PM Amit Langote <[email protected]> wrote:
> > * Enable pruning-aware locking in cached / generic plan reuse (0004):
> > extends GetCachedPlan() and CheckCachedPlan() to call ExecutorPrep()
> > on each PlannedStmt in the CachedPlan, locking only surviving
> > partitions. Adds CachedPlanPrepData to pass this through plan cache
> > APIs and down to execution via QueryDesc. Also reinstates the
> > firstResultRel locking rule added in 28317de72 but later lost due to
> > revert of the earlier pruning patch, to ensure correctness when all
> > target partitions are pruned.
>
> Looking at the changes to executor/function.c, I also noticed that I
> had mistakenly allocated the ExecutorPrep state in
> SQLFunctionCache.fcontext whereas the correct context for execution
> related state is SQLFunctionCache.subcontext. In the updated patch,
> I've made postquel_start() reparent the prep EState's es_query_cxt to
> subcontext from fcontext. I also did not have a test case that
> exercised cached plan reuse for SQL functions, so I added one. I split
> the function.c's GetCachedPlan() + CachedPlanPrepData plumbing into a
> new patch 0005 so it can be reviewed separately, since it is the only
> non-mechanical call-site change.
I also noticed a bug in the prep cleanup logic that runs when a cached
plan becomes invalid during the prep phase. Patch 0005 fixes that and
adds a regression test that exercises the invalidation path. This will
be folded into 0004 later.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v3-0004-Use-pruning-aware-locking-in-cached-plans.patch (24.5K, 2-v3-0004-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From dc0de03510539ddc3bd33327158785279356821f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:30:52 +0900
Subject: [PATCH v3 4/6] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry ExecutorPrep results
through the plan caching layer. Adjust call sites in SPI,
functions, portals, and EXPLAIN to propagate this data.
This ensures pruning decisions made during initial pruning are
consistently reused without redoing pruning logic in executor paths
like parallel workers. It also lays the groundwork for
pruning-dependent lock behavior during plan reuse.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
---
src/backend/commands/prepare.c | 19 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 4 +-
src/backend/executor/spi.c | 26 ++-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 3 +
src/backend/tcop/postgres.c | 9 +-
src/backend/utils/cache/plancache.c | 234 ++++++++++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 ++
src/include/utils/plancache.h | 24 ++-
11 files changed, 313 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index afd449c73ba..23332d19b37 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ /* Keep ExecutorPrep state with the portal and its resowner. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -205,7 +209,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_list,
cplan);
/*
@@ -575,6 +579,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_list;
ListCell *p;
@@ -633,8 +638,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ /* ExecutorPrep state is local to this EXPLAIN EXECUTE call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -653,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/* Explain each query */
i = 0;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 633310c5f5b..d81718ea84e 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -698,6 +698,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4c5647ac38a..c5812612f8d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4648,8 +4648,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 7a3cb944d6f..d580f1e0425 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1579,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1659,7 +1660,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ /* ExecutorPrep state lives in this portal's context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,7 +1690,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_list, /* lives in portalContext */
cplan);
/*
@@ -2078,6 +2083,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2101,9 +2107,13 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
error_context_stack = &spierrcontext;
/* Get the generic plan for the query */
+ /* ExecutorPrep() state lives in caller's active context. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cprep);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2501,6 +2511,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ CachedPlanPrepData cprep = {0};
List *prep_list;
int i;
@@ -2577,11 +2588,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+
+ /* ExecutorPrep state is per _SPI_execute_plan call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c4fd646b999..4c76e78c1da 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -608,6 +608,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ccdc9bc264a..229b39060ae 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1274,6 +1274,9 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
lappend_int(root->glob->resultRelations,
splan->rootRelation);
}
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels,
+ linitial_int(splan->resultRelations));
}
break;
case T_Append:
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d3964a12a14..249829f59a0 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1639,6 +1639,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2021,7 +2022,11 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+
+ /* ExecutorPrep() state lives in portal context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2034,7 +2039,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_list,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 6661d2c6b73..c1cfd47422c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,7 +93,7 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
@@ -101,6 +101,8 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -137,6 +139,26 @@ ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
/* GUC parameter */
int plan_cache_mode = PLAN_CACHE_MODE_AUTO;
+/*
+ * Lock acquisition policy for execution locks.
+ *
+ * LOCK_ALL acquires locks on all relations mentioned in the plan,
+ * reproducing the behavior of AcquireExecutorLocks().
+ *
+ * LOCK_UNPRUNED restricts locking to only the unpruned relations. That
+ * includes those mentioned in PlannedStmt.unprunableRelids and the leaf
+ * partitions remaining after performing initial pruning.
+ */
+typedef enum LockPolicy
+{
+ LOCK_ALL,
+ LOCK_UNPRUNED,
+} LockPolicy;
+
+static void AcquireExecutorLocksWithPolicy(List *stmt_list,
+ LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep);
+
/*
* InitPlanCache: initialize module during InitPostgres.
*
@@ -938,7 +960,12 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * PrepAndCheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ *
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The resulting ExecPrep structures are saved in cprep for
+ * later reuse by ExecutorStart().
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
@@ -947,7 +974,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -975,13 +1002,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ LockPolicy policy = !cprep ? LOCK_ALL : LOCK_UNPRUNED;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, true, cprep);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1003,7 +1032,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
}
/*
@@ -1283,6 +1312,10 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function prepares
+ * each PlannedStmt via ExecutorPrep() and stores the results in
+ * cprep->prep_list. These are intended to be passed later to ExecutorStart().
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1293,7 +1326,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1315,7 +1349,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (PrepAndCheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1902,6 +1938,38 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
+/*
+ * AcquireExecutorLocksWithPolicy
+ * Acquire or release execution locks for a cached plan according to
+ * the specified policy.
+ *
+ * LOCK_ALL reproduces AcquireExecutorLocks(), locking every relation in
+ * each PlannedStmt's rtable. LOCK_UNPRUNED restricts locking to the
+ * unprunable rels and partitions that survive initial runtime pruning.
+ *
+ * When LOCK_UNPRUNED is used on acquire, ExecutorPrep() is invoked for
+ * each PlannedStmt and the resulting ExecPrep pointers are appended to
+ * cprep->prep_list in cprep->context. On release, the same ExecPrep
+ * list is consulted to determine which relations to unlock and is then
+ * cleaned up with ExecPrepCleanup().
+ */
+static void
+AcquireExecutorLocksWithPolicy(List *stmt_list, LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ switch (policy)
+ {
+ case LOCK_ALL:
+ AcquireExecutorLocks(stmt_list, acquire);
+ break;
+ case LOCK_UNPRUNED:
+ AcquireExecutorLocksUnpruned(stmt_list, acquire, cprep);
+ break;
+ default:
+ elog(ERROR, "invalid LockPolicy");
+ }
+}
+
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
@@ -1954,6 +2022,158 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the ExecPrep pointer for each PlannedStmt to cprep->prep_list
+ *
+ * On release, it:
+ * - looks up the ExecPrep object for each PlannedStmt from cprep->prep_list
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ * - calls ExecPrepCleanup() on each ExecPrep
+ *
+ * prep_list is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for ExecPrep happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_list;
+ int i;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the ExecPrep list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_list = cprep->prep_list;
+ Assert(acquire || list_length(prep_list) == list_length(stmt_list));
+
+ i = 0;
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecPrep *prep;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_list = lappend(cprep->prep_list, NULL);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ prep = ExecutorPrep(plannedstmt, cprep->params, cprep->owner, true,
+ cprep->eflags);
+ Assert(prep || plannedstmt->partPruneInfos == NULL);
+ cprep->prep_list = lappend(cprep->prep_list, prep);
+ }
+ else
+ prep = list_nth(prep_list, i++);
+
+ Assert(prep == NULL || prep->prep_estate);
+ if (prep)
+ {
+ EState *prep_estate = prep->prep_estate;
+
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * firstResultRels may contain pruned partitions that must still be
+ * locked to satisfy executor assumptions (see comments in
+ * ExecInitModifyTable(). Ensure they’re included here.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+
+ /* Clean up prep if releasing locks. */
+ if (!acquire)
+ ExecPrepCleanup(prep);
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 30d889b54c5..6fb86dc05f6 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -141,6 +141,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..eb211f1ba56 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -123,6 +123,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a82b66d4bc2..c7b8ec4be39 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,27 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_list is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. The
+ * same list is later used to determine which relations to unlock when
+ * releasing execution locks.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_list; /* one ExecPrep per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate ExecPrep objects */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to pass to ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +261,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.47.3
[application/octet-stream] v3-0005-Add-test-exercising-prep-cleanup-on-cached-plan-i.patch (9.3K, 3-v3-0005-Add-test-exercising-prep-cleanup-on-cached-plan-i.patch)
download | inline diff:
From 052ab8fe38493ca106d749f4e2426a86d0267d59 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 20 Nov 2025 15:35:47 +0900
Subject: [PATCH v3 5/6] Add test exercising prep cleanup on cached-plan
invalidation
Add a regression test that causes a generic plan to become invalid
while pruning-aware setup is running. The pruning expression calls a
function that can perform DDL on a partition, making the plan stale
during reuse.
The test's purpose is to drive execution through the invalidation
path that discards any ExecutorPrep state created before the plan was
found invalid, providing coverage for that cleanup logic.
---
src/backend/utils/cache/plancache.c | 38 +++++++++++++--
src/test/regress/expected/plancache.out | 61 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 50 ++++++++++++++++++++
3 files changed, 144 insertions(+), 5 deletions(-)
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index c1cfd47422c..a9a4e11d1a5 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -103,6 +103,7 @@ static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -1033,6 +1034,9 @@ PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
/* Oops, the race case happened. Release useless locks. */
AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -2069,7 +2073,6 @@ LockRelids(List *rtable, Bitmapset *relids, bool acquire)
* - looks up the ExecPrep object for each PlannedStmt from cprep->prep_list
* (which must already be populated)
* - unlocks the same relations identified during acquire
- * - calls ExecPrepCleanup() on each ExecPrep
*
* prep_list is extended during acquire and must match stmt_list one-to-one
* when releasing locks. Memory allocation for ExecPrep happens in
@@ -2165,15 +2168,40 @@ AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
LockRelids(plannedstmt->rtable, lock_relids, acquire);
bms_free(lock_relids);
}
-
- /* Clean up prep if releasing locks. */
- if (!acquire)
- ExecPrepCleanup(prep);
}
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * CachedPlanPrepCleanup
+ * Clean up ExecPrep state built for a generic plan.
+ *
+ * This is used in the corner case where PrepAndCheckCachedPlan() discovers
+ * that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
+ * has already run. In that case we must both release the execution locks
+ * and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
+ * executor will never see or clean it up.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ ListCell *lc;
+
+ if (cprep == NULL)
+ return;
+
+ foreach(lc, cprep->prep_list)
+ {
+ ExecPrep *prep = (ExecPrep *) lfirst(lc);
+
+ ExecPrepCleanup(prep);
+ }
+
+ list_free(cprep->prep_list);
+ cprep->prep_list = NIL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..26c4c5e10fd 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,64 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..cc7eb4da4d3 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,53 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v3-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (28.7K, 4-v3-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From 11e0262e31e35539f50e96531559db6cd7e32160 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v3 2/6] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorPrep builds an EState containing the executor
metadata needed before plan execution, including partition
pruning state where partPruneInfos are present.
ExecutorStart() now expects QueryDesc->prep to point at such an
ExecPrep object. If no prep was supplied by the caller, it
invokes ExecutorPrep() itself and adopts the resulting EState
for the duration of the query. This keeps the executor startup
behaviour unchanged while making the setup work callable
separately when needed.
CreateQueryDesc() grows a prep argument and stores it in the
QueryDesc. Portals, SPI, SQL functions, and EXPLAIN are wired
to carry an optional ExecPrep pointer alongside the PlannedStmt
list, but most callers still pass NULL and let ExecutorStart()
perform the setup lazily.
Add the ExecPrep struct and ExecPrepCleanup() to encapsulate
ownership of the prepared EState and any caller specific
cleanup hook. Update executor/README and related comments to
document the new control flow and the separation between
preparation and execution.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 11 +-
src/backend/executor/README | 8 +-
src/backend/executor/execMain.c | 179 +++++++++++++++++++++++----
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 3 +
src/backend/executor/functions.c | 1 +
src/backend/executor/spi.c | 10 ++
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 27 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 3 +-
src/include/executor/executor.h | 11 ++
src/include/nodes/execnodes.h | 48 +++++++
src/include/utils/portal.h | 2 +
21 files changed, 286 insertions(+), 40 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..5efbb0949c2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -870,7 +870,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 1ccc2e55c64..9eabe4920cd 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -334,7 +334,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7e699f8595e..d6ab3697dd9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -370,7 +370,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,7 +492,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -548,7 +549,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, prep, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 93ef1ad106f..3cca6d45ec1 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -993,6 +993,7 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ef7c0d624f1..30cbf9f264f 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -437,7 +437,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index ec96c2efcd3..ac1ddd25aba 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL),
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 34b6410d6a2..afd449c73ba 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,6 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -575,6 +576,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_list;
ListCell *p;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -585,6 +587,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i;
if (es->memory)
{
@@ -650,14 +653,20 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_list = NIL;
/* Explain each query */
+ i = 0;
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecPrep *prep = prep_list ?
+ (ExecPrep *) list_nth(prep_list, i) : NULL;
+ i++;
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..95b5ec58c55 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,10 +291,16 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Performs range
+ table initialization, permission checks, and initial partition pruning.
+ Returns an ExecPrep wrapper with EState that ExecutorStart may reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
+ CreateExecutorState (or reuse one from ExecPrep if present)
creates per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..39de0b93a1c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -171,8 +171,26 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
*/
- estate = CreateExecutorState();
+ if (queryDesc->prep == NULL)
+ queryDesc->prep = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ true,
+ eflags);
+ Assert(queryDesc->prep);
+ estate = queryDesc->prep->prep_estate;
+
+ /*
+ * Executor is adopting the prep's EState. Mark it so ExecPrepCleanup()
+ * doesn't try to free it redundantly.
+ */
+ queryDesc->prep->owns_estate = false;
+
queryDesc->estate = estate;
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -263,6 +281,136 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true.
+ *
+ * This is intended for callers that need executor metadata ahead of actual
+ * execution. Typical use cases include:
+ * - determining which relations must be locked during plan cache validation;
+ * - initializing unpruned relids and valid subplans in parallel workers
+ * using state copied from the leader.
+ *
+ * The executor can reuse the resulting state to avoid redundant setup during
+ * ExecutorStart().
+ *
+ * Returns an ExecPrep wrapper that owns the EState and can be reused
+ * or cleaned up later.
+ */
+ExecPrep *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ bool do_initial_pruning, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+ bool snapshot_set;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Pruning may use expressions that require an active snapshot. */
+ snapshot_set = false;
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snapshot_set = true;
+ }
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to compute
+ * the subset of child subplans that will be executed. The results,
+ * which are bitmapsets of selected child indexes, are saved in
+ * es_part_prune_results. This list is parallel to es_part_prune_infos.
+ *
+ * In parallel workers, do_initial_pruning should be false -- they receive
+ * es_part_prune_results from the leader process and should only initialize
+ * the PartitionPruneStates.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ /* Release snapshot if we got one */
+ if (snapshot_set)
+ PopActiveSnapshot();
+
+ return CreateExecPrep(estate, CurrentMemoryContext, NULL, NULL);
+}
+
+/*
+ * CreateExecPrep: initialize ExecPrep wrapper with optional cleanup metadata.
+ */
+ExecPrep *
+CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg)
+{
+ ExecPrep *prep = palloc0(sizeof(ExecPrep));
+
+ prep->prep_estate = estate;
+ prep->context = context;
+ prep->cleanup = cleanup;
+ prep->cleanup_arg = cleanup_arg;
+ prep->owns_estate = true;
+
+ return prep;
+}
+
+/*
+ * ExecPrepCleanup: free ExecPrep resources not adopted by the executor.
+ *
+ * Only frees the EState if it wasn't taken over by ExecutorStart().
+ * Always runs the optional user-defined cleanup callback.
+ */
+void
+ExecPrepCleanup(ExecPrep *prep)
+{
+ if (prep == NULL)
+ return;
+
+ if (prep->prep_estate && prep->owns_estate)
+ {
+ ExecCloseRangeTableRelations(prep->prep_estate);
+ FreeExecutorState(prep->prep_estate);
+ }
+
+ if (prep->cleanup)
+ prep->cleanup(prep->cleanup_arg);
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -824,7 +972,6 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
@@ -838,37 +985,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->prep);
+ Assert(estate == queryDesc->prep->prep_estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f098a5557cf..aedbd9566d6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1281,6 +1281,7 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 88b150c8d77..187a480e508 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2368,6 +2368,9 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /* Wouldn't be available at ExecutorPrep() time. */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 630d708d2a3..633310c5f5b 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1362,6 +1362,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 653500b38dc..7a3cb944d6f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1685,6 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2500,6 +2501,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_list;
+ int i;
spicallbackarg.query = plansource->query_string;
@@ -2578,6 +2581,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_list = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2615,12 +2619,17 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ i = 0;
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecPrep *prep = prep_list ?
+ list_nth(prep_list, i) : NULL;
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
+ i++;
+
/*
* Reset output state. (Note that if a non-SPI receiver is used,
* _SPI_current->processed will stay zero, and that's what we'll
@@ -2690,6 +2699,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ prep,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 2bd89102686..d3964a12a14 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1232,6 +1232,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2033,6 +2034,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index fde78c55160..82c295502b0 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -66,6 +67,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -78,6 +80,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->prep = prep; /* executor prep output */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -112,6 +115,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /* ExecPrep cleanup if necessary */
+ if (qdesc->prep)
+ {
+ ExecPrepCleanup(qdesc->prep);
+ qdesc->prep = NULL;
+ }
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -123,6 +133,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep: ExecPrep for the plan (output of ExecutorPrep())
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +146,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -146,7 +158,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, prep, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -489,6 +501,9 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->preps ?
+ (ExecPrep *) linitial(portal->preps) :
+ NULL,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1185,6 +1200,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1221,14 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecPrep *prep = portal->preps ?
+ list_nth(portal->preps, i) : NULL;
+
+ i++;
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1286,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1295,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 943da087c9f..313f8ef2fdc 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -298,6 +299,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->preps = preps;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 6e51d50efc7..6aa8b275aa2 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,7 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..c18530f5d11 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -18,7 +18,6 @@
#include "nodes/execnodes.h"
#include "tcop/dest.h"
-
/* ----------------
* query descriptor:
*
@@ -35,6 +34,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecPrep *prep; /* output of ExecutorPrep() or NULL */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +57,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fa2b657fb2f..3579926d4e8 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -20,6 +20,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -234,6 +235,16 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern ExecPrep *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ bool do_initial_pruning,
+ int eflags);
+extern ExecPrep *CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg);
+extern void ExecPrepCleanup(ExecPrep *prep);
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..8bdecd631bf 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -772,6 +772,54 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
+/*
+ * ExecPrep: encapsulates executor preparation results for a PlannedStmt.
+ *
+ * ExecutorPrep() factors out executor setup steps such as initializing the
+ * range table, checking permissions, and executing initial partition pruning.
+ * ExecutorStart() can reuse the prepared EState instead of repeating that
+ * work, and other callers (such as plan cache validation) can use it without
+ * running the full plan.
+ */
+
+/*
+ * Optional callback to clean up user-specific resources associated with
+ * ExecPrep.
+ */
+typedef void (*execprep_cleanup_fn)(void *prep);
+
+typedef struct ExecPrep
+{
+ /*
+ * Context in which this struct and all subsidiary allocations were made.
+ * This context must remain alive until ExecPrepCleanup is called.
+ */
+ MemoryContext context;
+
+ /*
+ * Partially-initialized executor state used for permission checks and
+ * pruning. May be adopted directly by ExecutorStart(), in which case
+ * ExecPrepCleanup will skip freeing it.
+ */
+ EState *prep_estate;
+
+ /*
+ * True if ExecPrepCleanup() must free the EState. If the executor adopts
+ * prep_estate, this is set to false to avoid double-free.
+ */
+ bool owns_estate;
+
+ /*
+ * Optional caller-supplied cleanup hook to run during ExecPrepCleanup.
+ * Useful for releasing external resources associated with the prep.
+ */
+ execprep_cleanup_fn cleanup;
+
+ /*
+ * Opaque pointer to pass to the cleanup hook.
+ */
+ void *cleanup_arg;
+} ExecPrep;
/*
* ExecRowMark -
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 5ffa6fd5cc8..013bcc3bd8e 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *preps; /* list of ExecPreps where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v3-0006-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (6.7K, 5-v3-0006-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From 733e3c712ec59b75da031694155c98476f290f37 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 17 Nov 2025 17:40:26 +0900
Subject: [PATCH v3 6/6] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users, which a later
patch will use to support pruning aware locking.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 32 +++++++++++++++++++++++--
src/test/regress/expected/plancache.out | 30 +++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 27 +++++++++++++++++++++
3 files changed, 87 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index d81718ea84e..ed7352fce61 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -72,6 +72,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ ExecPrep *prep; /* ExecutorPrep() output for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -657,6 +658,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ int i;
/*
* Clean up after previous query, if there was one.
@@ -695,11 +698,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -720,9 +732,12 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ i = 0;
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ ExecPrep *prep = cprep.prep_list ? list_nth(cprep.prep_list, i++) :
+ NULL;
execution_state *newes;
/*
@@ -764,6 +779,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep = prep;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1362,8 +1378,20 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ if (es->prep)
+ {
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ EState *prep_estate = es->prep->prep_estate;
+
+ MemoryContextSetParent(prep_estate->es_query_cxt, fcache->subcontext);
+ }
+
es->qd = CreateQueryDesc(es->stmt,
- NULL,
+ es->prep,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 26c4c5e10fd..bf937364716 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -458,4 +458,34 @@ NOTICE: creating index on partition inval_during_pruning_p1
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index cc7eb4da4d3..71320799040 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -272,4 +272,31 @@ explain (verbose, costs off) execute inval_during_pruning_q;
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+
reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v3-0003-Reuse-partition-pruning-results-in-parallel-worke.patch (9.1K, 6-v3-0003-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From d9d95e09961dcb8236e5fe7b2da4a37fda8e5944 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:17:47 +0900
Subject: [PATCH v3 3/6] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce ExecCheckInitialPruningResults() to verify that the results
match what the worker would compute. This check helps catch
inconsistencies across leader and worker pruning logic.
While valuable on its own, this change also lays the foundation for
future optimizations where the leader may take locks only on
surviving partitions. Ensuring that workers follow identical pruning
decisions makes such selective locking safe.
---
src/backend/executor/execParallel.c | 67 +++++++++++++++++++++++++++-
src/backend/executor/execPartition.c | 35 +++++++++++++++
src/include/executor/execPartition.h | 1 +
3 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aedbd9566d6..751590adcc9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -65,6 +66,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -608,12 +611,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -642,6 +651,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -668,6 +679,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -769,6 +790,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1263,10 +1294,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ ExecPrep *prep = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1279,9 +1315,38 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, false, 0);
+ Assert(prep->prep_estate);
+
+ prep->prep_estate->es_part_prune_results = part_prune_results;
+ prep->prep_estate->es_unpruned_relids =
+ bms_add_members(prep->prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * Verify that the pruning results passed from the leader match
+ * what the worker would independently compute.
+ */
+ ExecCheckInitialPruningResults(prep->prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
- NULL,
+ prep,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 187a480e508..3b450e3373f 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1872,6 +1872,41 @@ ExecDoInitialPruning(EState *estate)
}
}
+/*
+ * ExecCheckInitialPruningResults
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ */
+void
+ExecCheckInitialPruningResults(EState *estate)
+{
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (bms_nonempty_difference(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplns in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+}
+
/*
* ExecInitPartitionExecPruning
* Initialize the data structures needed for runtime "exec" partition
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index ba8cc594fc9..126efd008e5 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -132,6 +132,7 @@ typedef struct PartitionPruneState
extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
+extern void ExecCheckInitialPruningResults(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
--
2.47.3
[application/octet-stream] v3-0001-Refactor-partition-pruning-initialization-for-cla.patch (7.7K, 7-v3-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From 243d407de86b0a73b9bd8c8dbc541f630eb33747 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:18:24 +0900
Subject: [PATCH v3 1/6] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Also simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++-----------
src/include/executor/execPartition.h | 1 +
2 files changed, 43 insertions(+), 28 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index aa12e9ad2ea..88b150c8d77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1772,6 +1771,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1796,6 +1798,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1803,11 +1828,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1825,20 +1850,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -1846,8 +1864,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -1965,14 +1981,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2206,8 +2220,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2219,8 +2233,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3b3f46aced0..ba8cc594fc9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-11-23 12:17 Tender Wang <[email protected]>
parent: Amit Langote <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Tender Wang @ 2025-11-23 12:17 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Tom Lane <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Amit Langote <[email protected]> 于2025年11月20日周四 15:30写道:
> On Mon, Nov 17, 2025 at 9:50 PM Amit Langote <[email protected]>
> wrote:
> > On Wed, Nov 12, 2025 at 11:17 PM Amit Langote <[email protected]>
> wrote:
> > > * Enable pruning-aware locking in cached / generic plan reuse (0004):
> > > extends GetCachedPlan() and CheckCachedPlan() to call ExecutorPrep()
> > > on each PlannedStmt in the CachedPlan, locking only surviving
> > > partitions. Adds CachedPlanPrepData to pass this through plan cache
> > > APIs and down to execution via QueryDesc. Also reinstates the
> > > firstResultRel locking rule added in 28317de72 but later lost due to
> > > revert of the earlier pruning patch, to ensure correctness when all
> > > target partitions are pruned.
> >
> > Looking at the changes to executor/function.c, I also noticed that I
> > had mistakenly allocated the ExecutorPrep state in
> > SQLFunctionCache.fcontext whereas the correct context for execution
> > related state is SQLFunctionCache.subcontext. In the updated patch,
> > I've made postquel_start() reparent the prep EState's es_query_cxt to
> > subcontext from fcontext. I also did not have a test case that
> > exercised cached plan reuse for SQL functions, so I added one. I split
> > the function.c's GetCachedPlan() + CachedPlanPrepData plumbing into a
> > new patch 0005 so it can be reviewed separately, since it is the only
> > non-mechanical call-site change.
>
> I also noticed a bug in the prep cleanup logic that runs when a cached
> plan becomes invalid during the prep phase. Patch 0005 fixes that and
> adds a regression test that exercises the invalidation path. This will
> be folded into 0004 later.
>
I spent time looking at these patches.
I search all places that call GetCachedPlan(), and we always pass
&cprep(CachedPlanPrepData) to GetCachedPlan().
In PrepAndCheckCachedPlan(), if the plan_cache_mode is force_generic_plan,
the LockPolicy is always LOCK_UNPRUNED. Because *cprep has never been NULL.
It seems that the LockPolicy has no chance to be LOCK_ALL. Do I miss
something here?
--
Thanks,
Tender Wang
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-11-24 03:29 Chao Li <[email protected]>
parent: Amit Langote <[email protected]>
1 sibling, 1 reply; 82+ messages in thread
From: Chao Li @ 2025-11-24 03:29 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Hi, Amit,
Locking only surviving partitions sounds a good optimization. I started to review this patch, but I cannot finish reviewing in one day. I will post my comments as long as I finished some commits.
> On Nov 20, 2025, at 15:30, Amit Langote <[email protected]> wrote:
>
> <v3-0004-Use-pruning-aware-locking-in-cached-plans.patch><v3-0005-Add-test-exercising-prep-cleanup-on-cached-plan-i.patch><v3-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch><v3-0006-Make-SQL-function-executor-track-ExecutorPrep-sta.patch><v3-0003-Reuse-partition-pruning-results-in-parallel-worke.patch><v3-0001-Refactor-partition-pruning-initialization-for-cla.patch>
0001 splits creations of es_part_prune_states into a new function ExecCreatePartitionPruneStates(). With that, you are trying to make the code clearer as you stated in the commit comment. However, the new function is not called, meaning 0001 is not self-contained, feels unusual to me according to the patches I have reviewed so far. I would suggest have ExecDoInitialPruning() call ExecCreatePartitionPruneStates() when es_part_prune_states is still NIL., so that current logic is unchanged, and 0001 can be pushed independently.
0002 moves check permission etc logic from InitPlan() to the new function ExecutorPrep(). The commit message says “executor setup logic unchanged”. Because in old code, before permission check, there was no PushActiveSnapshot(), but in the patch, before check permission, PushActiveSnapshot() is done, which may introduce different behavior, I just wonder why PushActiveSnapshot() is added?
Actually, I am still trying to understand 0002-0004, it would take me some time to fully understand the patch. I’d raise the above comments first. I will continue reviewing this patch tomorrow.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-11-25 01:56 Amit Langote <[email protected]>
parent: Tender Wang <[email protected]>
0 siblings, 0 replies; 82+ messages in thread
From: Amit Langote @ 2025-11-25 01:56 UTC (permalink / raw)
To: Tender Wang <[email protected]>; +Cc: Tom Lane <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Sun, Nov 23, 2025 at 9:17 PM Tender Wang <[email protected]> wrote:
> Amit Langote <[email protected]> 于2025年11月20日周四 15:30写道:
>>
>> On Mon, Nov 17, 2025 at 9:50 PM Amit Langote <[email protected]> wrote:
>> > On Wed, Nov 12, 2025 at 11:17 PM Amit Langote <[email protected]> wrote:
>> > > * Enable pruning-aware locking in cached / generic plan reuse (0004):
>> > > extends GetCachedPlan() and CheckCachedPlan() to call ExecutorPrep()
>> > > on each PlannedStmt in the CachedPlan, locking only surviving
>> > > partitions. Adds CachedPlanPrepData to pass this through plan cache
>> > > APIs and down to execution via QueryDesc. Also reinstates the
>> > > firstResultRel locking rule added in 28317de72 but later lost due to
>> > > revert of the earlier pruning patch, to ensure correctness when all
>> > > target partitions are pruned.
>> >
>> > Looking at the changes to executor/function.c, I also noticed that I
>> > had mistakenly allocated the ExecutorPrep state in
>> > SQLFunctionCache.fcontext whereas the correct context for execution
>> > related state is SQLFunctionCache.subcontext. In the updated patch,
>> > I've made postquel_start() reparent the prep EState's es_query_cxt to
>> > subcontext from fcontext. I also did not have a test case that
>> > exercised cached plan reuse for SQL functions, so I added one. I split
>> > the function.c's GetCachedPlan() + CachedPlanPrepData plumbing into a
>> > new patch 0005 so it can be reviewed separately, since it is the only
>> > non-mechanical call-site change.
>>
>> I also noticed a bug in the prep cleanup logic that runs when a cached
>> plan becomes invalid during the prep phase. Patch 0005 fixes that and
>> adds a regression test that exercises the invalidation path. This will
>> be folded into 0004 later.
>
> I spent time looking at these patches.
>
> I search all places that call GetCachedPlan(), and we always pass &cprep(CachedPlanPrepData) to GetCachedPlan().
> In PrepAndCheckCachedPlan(), if the plan_cache_mode is force_generic_plan, the LockPolicy is always LOCK_UNPRUNED. Because *cprep has never been NULL.
> It seems that the LockPolicy has no chance to be LOCK_ALL. Do I miss something here?
Yes, eventually LockPolicy may end up redundant and we might not need
AcquireExecutorLocksPolicy() at all, with a single locking path
covering both cases.
My goal initially was to stage the changes across call sites: keep a
LOCK_ALL path for callers that still use the old lock everything up
front behaviour, and gradually convert other callers to pass a
non-NULL CachedPlanPrepData and handle the prep_list it may return, so
that GetCachedPlan() can perform LOCK_UNPRUNED locking internally.
That is why GetCachedPlan() accepts a possibly NULL cprep and why
LockPolicy exists as a separate knob.
For example, I decided to split out function.c refactoring of plan
cache usage into its own patch. That made me realise that new users of
GetCachedPlan() may appear that first adopt the simpler LOCK_ALL
behaviour and only later switch to UNPRUNED when pruning aware locking
becomes useful for them. Keeping the two paths preserves that
incremental route and avoids forcing every new user to adopt
CachedPlanPrepData and UNPRUNED locking up front. I am undecided yet
if that two path structure is a good idea, but I am inclined to keep
it for now. I would be happy to hear opinions on this.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2025-11-25 08:31 Amit Langote <[email protected]>
parent: Chao Li <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2025-11-25 08:31 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Hi Evan,
On Mon, Nov 24, 2025 at 12:30 PM Chao Li <[email protected]> wrote:
>
> Hi, Amit,
>
> Locking only surviving partitions sounds a good optimization. I started to review this patch, but I cannot finish reviewing in one day. I will post my comments as long as I finished some commits.
Thank you very much for taking the time to review.
> > On Nov 20, 2025, at 15:30, Amit Langote <[email protected]> wrote:
> >
> > <v3-0004-Use-pruning-aware-locking-in-cached-plans.patch><v3-0005-Add-test-exercising-prep-cleanup-on-cached-plan-i.patch><v3-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch><v3-0006-Make-SQL-function-executor-track-ExecutorPrep-sta.patch><v3-0003-Reuse-partition-pruning-results-in-parallel-worke.patch><v3-0001-Refactor-partition-pruning-initialization-for-cla.patch>
>
>
> 0001 splits creations of es_part_prune_states into a new function ExecCreatePartitionPruneStates(). With that, you are trying to make the code clearer as you stated in the commit comment. However, the new function is not called, meaning 0001 is not self-contained, feels unusual to me according to the patches I have reviewed so far.
Oops, that is not intentional.
> I would suggest have ExecDoInitialPruning() call ExecCreatePartitionPruneStates() when es_part_prune_states is still NIL., so that current logic is unchanged, and 0001 can be pushed independently.
0002 adds a call to ExecDoInitialPruning() in ExecutorPrep(), preceded
by a call to ExecCreatePartitionPruneStates(), and that is how I think
it should be. So in the attached updated 0001, I have made InitPlan()
call ExecCreatePartitionPruneStates() before calling
ExecDoInitialPruning().
> 0002 moves check permission etc logic from InitPlan() to the new function ExecutorPrep(). The commit message says “executor setup logic unchanged”. Because in old code, before permission check, there was no PushActiveSnapshot(), but in the patch, before check permission, PushActiveSnapshot() is done, which may introduce different behavior, I just wonder why PushActiveSnapshot() is added?
That is a valid concern.
I found it necessary because the initial pruning code (which runs in
ExecDoInitialPruning()) may require ActiveSnapshot to be valid if
pruning expressions end up calling code that invokes
EnsurePortalSnapshotExists(). That requirement already existed when
ExecDoInitialPruning() was driven from ExecutorStart(), but
ExecutorPrep() can now be called from places that do not otherwise
push a snapshot. The snapshot push is only there to cover those
callers. It does not change permission checking itself, it just
ensures ExecutorPrep() runs with the same preconditions that
ExecutorStart() always had.
> Actually, I am still trying to understand 0002-0004, it would take me some time to fully understand the patch. I’d raise the above comments first. I will continue reviewing this patch tomorrow.
Thanks, I appreciate your review.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v4-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (28.8K, 2-v4-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From a004aab1ce9418a2f6273d1a67673b3d4a7c218b Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v4 2/6] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorPrep builds an EState containing the executor
metadata needed before plan execution, including partition
pruning state where partPruneInfos are present.
ExecutorStart() now expects QueryDesc->prep to point at such an
ExecPrep object. If no prep was supplied by the caller, it
invokes ExecutorPrep() itself and adopts the resulting EState
for the duration of the query. This keeps the executor startup
behaviour unchanged while making the setup work callable
separately when needed.
CreateQueryDesc() grows a prep argument and stores it in the
QueryDesc. Portals, SPI, SQL functions, and EXPLAIN are wired
to carry an optional ExecPrep pointer alongside the PlannedStmt
list, but most callers still pass NULL and let ExecutorStart()
perform the setup lazily.
Add the ExecPrep struct and ExecPrepCleanup() to encapsulate
ownership of the prepared EState and any caller specific
cleanup hook. Update executor/README and related comments to
document the new control flow and the separation between
preparation and execution.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 7 +-
src/backend/commands/extension.c | 1 +
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 11 +-
src/backend/executor/README | 8 +-
src/backend/executor/execMain.c | 180 ++++++++++++++++++++++-----
src/backend/executor/execParallel.c | 1 +
src/backend/executor/execPartition.c | 3 +
src/backend/executor/functions.c | 1 +
src/backend/executor/spi.c | 10 ++
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 27 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 3 +-
src/include/executor/executor.h | 11 ++
src/include/nodes/execnodes.h | 48 +++++++
src/include/utils/portal.h | 2 +
21 files changed, 286 insertions(+), 41 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index cef452584e5..5efbb0949c2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -870,7 +870,7 @@ BeginCopyTo(ParseState *pstate,
((DR_copy *) dest)->cstate = cstate;
/* Create a QueryDesc requesting no output */
- cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ cstate->queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 1ccc2e55c64..9eabe4920cd 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -334,7 +334,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
+ queryDesc = CreateQueryDesc(plan, NULL, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 7e699f8595e..d6ab3697dd9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -370,7 +370,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,7 +492,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -548,7 +549,7 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
dest = None_Receiver;
/* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ queryDesc = CreateQueryDesc(plannedstmt, prep, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, instrument_option);
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index ebc204c4462..9429fc2d17d 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -993,6 +993,7 @@ execute_sql_string(const char *sql, const char *filename)
QueryDesc *qdesc;
qdesc = CreateQueryDesc(stmt,
+ NULL,
sql,
GetActiveSnapshot(), NULL,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index ef7c0d624f1..30cbf9f264f 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -437,7 +437,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
UpdateActiveSnapshotCommandId();
/* Create a QueryDesc, redirecting output to our tuple receiver */
- queryDesc = CreateQueryDesc(plan, queryString,
+ queryDesc = CreateQueryDesc(plan, NULL, queryString,
GetActiveSnapshot(), InvalidSnapshot,
dest, NULL, NULL, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index ec96c2efcd3..ac1ddd25aba 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ list_make1(NULL),
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 34b6410d6a2..afd449c73ba 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,6 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -575,6 +576,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_list;
ListCell *p;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -585,6 +587,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ int i;
if (es->memory)
{
@@ -650,14 +653,20 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_list = NIL;
/* Explain each query */
+ i = 0;
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ ExecPrep *prep = prep_list ?
+ (ExecPrep *) list_nth(prep_list, i) : NULL;
+ i++;
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..95b5ec58c55 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,10 +291,16 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Performs range
+ table initialization, permission checks, and initial partition pruning.
+ Returns an ExecPrep wrapper with EState that ExecutorStart may reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
+ CreateExecutorState (or reuse one from ExecPrep if present)
creates per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index f5f4986383d..39de0b93a1c 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -171,8 +171,26 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
*/
- estate = CreateExecutorState();
+ if (queryDesc->prep == NULL)
+ queryDesc->prep = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ true,
+ eflags);
+ Assert(queryDesc->prep);
+ estate = queryDesc->prep->prep_estate;
+
+ /*
+ * Executor is adopting the prep's EState. Mark it so ExecPrepCleanup()
+ * doesn't try to free it redundantly.
+ */
+ queryDesc->prep->owns_estate = false;
+
queryDesc->estate = estate;
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -263,6 +281,136 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true.
+ *
+ * This is intended for callers that need executor metadata ahead of actual
+ * execution. Typical use cases include:
+ * - determining which relations must be locked during plan cache validation;
+ * - initializing unpruned relids and valid subplans in parallel workers
+ * using state copied from the leader.
+ *
+ * The executor can reuse the resulting state to avoid redundant setup during
+ * ExecutorStart().
+ *
+ * Returns an ExecPrep wrapper that owns the EState and can be reused
+ * or cleaned up later.
+ */
+ExecPrep *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ bool do_initial_pruning, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+ bool snapshot_set;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Pruning may use expressions that require an active snapshot. */
+ snapshot_set = false;
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snapshot_set = true;
+ }
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to compute
+ * the subset of child subplans that will be executed. The results,
+ * which are bitmapsets of selected child indexes, are saved in
+ * es_part_prune_results. This list is parallel to es_part_prune_infos.
+ *
+ * In parallel workers, do_initial_pruning should be false -- they receive
+ * es_part_prune_results from the leader process and should only initialize
+ * the PartitionPruneStates.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ /* Release snapshot if we got one */
+ if (snapshot_set)
+ PopActiveSnapshot();
+
+ return CreateExecPrep(estate, CurrentMemoryContext, NULL, NULL);
+}
+
+/*
+ * CreateExecPrep: initialize ExecPrep wrapper with optional cleanup metadata.
+ */
+ExecPrep *
+CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg)
+{
+ ExecPrep *prep = palloc0(sizeof(ExecPrep));
+
+ prep->prep_estate = estate;
+ prep->context = context;
+ prep->cleanup = cleanup;
+ prep->cleanup_arg = cleanup_arg;
+ prep->owns_estate = true;
+
+ return prep;
+}
+
+/*
+ * ExecPrepCleanup: free ExecPrep resources not adopted by the executor.
+ *
+ * Only frees the EState if it wasn't taken over by ExecutorStart().
+ * Always runs the optional user-defined cleanup callback.
+ */
+void
+ExecPrepCleanup(ExecPrep *prep)
+{
+ if (prep == NULL)
+ return;
+
+ if (prep->prep_estate && prep->owns_estate)
+ {
+ ExecCloseRangeTableRelations(prep->prep_estate);
+ FreeExecutorState(prep->prep_estate);
+ }
+
+ if (prep->cleanup)
+ prep->cleanup(prep->cleanup_arg);
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -824,7 +972,6 @@ ExecCheckXactReadOnly(PlannedStmt *plannedstmt)
PreventCommandIfParallelMode(CreateCommandName((Node *) plannedstmt));
}
-
/* ----------------------------------------------------------------
* InitPlan
*
@@ -838,38 +985,15 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecCreatePartitionPruneStates(estate);
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->prep);
+ Assert(estate == queryDesc->prep->prep_estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index f098a5557cf..aedbd9566d6 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1281,6 +1281,7 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
+ NULL,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 61559642662..ac5e2ebee72 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -2369,6 +2369,9 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /* Wouldn't be available at ExecutorPrep() time. */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 630d708d2a3..633310c5f5b 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1362,6 +1362,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest = None_Receiver;
es->qd = CreateQueryDesc(es->stmt,
+ NULL,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 653500b38dc..7a3cb944d6f 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1685,6 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2500,6 +2501,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_list;
+ int i;
spicallbackarg.query = plansource->query_string;
@@ -2578,6 +2581,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_list = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2615,12 +2619,17 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ i = 0;
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ ExecPrep *prep = prep_list ?
+ list_nth(prep_list, i) : NULL;
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
+ i++;
+
/*
* Reset output state. (Note that if a non-SPI receiver is used,
* _SPI_current->processed will stay zero, and that's what we'll
@@ -2690,6 +2699,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
snap = InvalidSnapshot;
qdesc = CreateQueryDesc(stmt,
+ prep,
plansource->query_string,
snap, crosscheck_snapshot,
dest,
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 7dd75a490aa..5880a574a06 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1232,6 +1232,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2033,6 +2034,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index fde78c55160..82c295502b0 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -66,6 +67,7 @@ static void DoPortalRewind(Portal portal);
*/
QueryDesc *
CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
@@ -78,6 +80,7 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->operation = plannedstmt->commandType; /* operation */
qd->plannedstmt = plannedstmt; /* plan */
+ qd->prep = prep; /* executor prep output */
qd->sourceText = sourceText; /* query text */
qd->snapshot = RegisterSnapshot(snapshot); /* snapshot */
/* RI check snapshot */
@@ -112,6 +115,13 @@ FreeQueryDesc(QueryDesc *qdesc)
UnregisterSnapshot(qdesc->snapshot);
UnregisterSnapshot(qdesc->crosscheck_snapshot);
+ /* ExecPrep cleanup if necessary */
+ if (qdesc->prep)
+ {
+ ExecPrepCleanup(qdesc->prep);
+ qdesc->prep = NULL;
+ }
+
/* Only the QueryDesc itself need be freed */
pfree(qdesc);
}
@@ -123,6 +133,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep: ExecPrep for the plan (output of ExecutorPrep())
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +146,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ ExecPrep *prep,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -146,7 +158,7 @@ ProcessQuery(PlannedStmt *plan,
/*
* Create the QueryDesc object
*/
- queryDesc = CreateQueryDesc(plan, sourceText,
+ queryDesc = CreateQueryDesc(plan, prep, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
dest, params, queryEnv, 0);
@@ -489,6 +501,9 @@ PortalStart(Portal portal, ParamListInfo params,
* the destination to DestNone.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->preps ?
+ (ExecPrep *) linitial(portal->preps) :
+ NULL,
portal->sourceText,
GetActiveSnapshot(),
InvalidSnapshot,
@@ -1185,6 +1200,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ int i;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1221,14 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ i = 0;
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ ExecPrep *prep = portal->preps ?
+ list_nth(portal->preps, i) : NULL;
+
+ i++;
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1286,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1295,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 943da087c9f..313f8ef2fdc 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -298,6 +299,7 @@ PortalDefineQuery(Portal portal,
portal->qc.nprocessed = 0;
portal->commandTag = commandTag;
portal->stmts = stmts;
+ portal->preps = preps;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 6e51d50efc7..6aa8b275aa2 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,7 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, ExecPrep *prep,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 86db3dc8d0d..c18530f5d11 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -18,7 +18,6 @@
#include "nodes/execnodes.h"
#include "tcop/dest.h"
-
/* ----------------
* query descriptor:
*
@@ -35,6 +34,7 @@ typedef struct QueryDesc
/* These fields are provided by CreateQueryDesc */
CmdType operation; /* CMD_SELECT, CMD_UPDATE, etc. */
PlannedStmt *plannedstmt; /* planner's output (could be utility, too) */
+ ExecPrep *prep; /* output of ExecutorPrep() or NULL */
const char *sourceText; /* source text of the query */
Snapshot snapshot; /* snapshot to use for query */
Snapshot crosscheck_snapshot; /* crosscheck for RI update/delete */
@@ -57,6 +57,7 @@ typedef struct QueryDesc
/* in pquery.c */
extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
+ ExecPrep *prep,
const char *sourceText,
Snapshot snapshot,
Snapshot crosscheck_snapshot,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fa2b657fb2f..3579926d4e8 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -20,6 +20,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -234,6 +235,16 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern ExecPrep *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ bool do_initial_pruning,
+ int eflags);
+extern ExecPrep *CreateExecPrep(EState *estate, MemoryContext context,
+ execprep_cleanup_fn cleanup, void *cleanup_arg);
+extern void ExecPrepCleanup(ExecPrep *prep);
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 18ae8f0d4bb..8bdecd631bf 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -772,6 +772,54 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
+/*
+ * ExecPrep: encapsulates executor preparation results for a PlannedStmt.
+ *
+ * ExecutorPrep() factors out executor setup steps such as initializing the
+ * range table, checking permissions, and executing initial partition pruning.
+ * ExecutorStart() can reuse the prepared EState instead of repeating that
+ * work, and other callers (such as plan cache validation) can use it without
+ * running the full plan.
+ */
+
+/*
+ * Optional callback to clean up user-specific resources associated with
+ * ExecPrep.
+ */
+typedef void (*execprep_cleanup_fn)(void *prep);
+
+typedef struct ExecPrep
+{
+ /*
+ * Context in which this struct and all subsidiary allocations were made.
+ * This context must remain alive until ExecPrepCleanup is called.
+ */
+ MemoryContext context;
+
+ /*
+ * Partially-initialized executor state used for permission checks and
+ * pruning. May be adopted directly by ExecutorStart(), in which case
+ * ExecPrepCleanup will skip freeing it.
+ */
+ EState *prep_estate;
+
+ /*
+ * True if ExecPrepCleanup() must free the EState. If the executor adopts
+ * prep_estate, this is set to false to avoid double-free.
+ */
+ bool owns_estate;
+
+ /*
+ * Optional caller-supplied cleanup hook to run during ExecPrepCleanup.
+ * Useful for releasing external resources associated with the prep.
+ */
+ execprep_cleanup_fn cleanup;
+
+ /*
+ * Opaque pointer to pass to the cleanup hook.
+ */
+ void *cleanup_arg;
+} ExecPrep;
/*
* ExecRowMark -
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index 5ffa6fd5cc8..013bcc3bd8e 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *preps; /* list of ExecPreps where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *preps,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v4-0003-Reuse-partition-pruning-results-in-parallel-worke.patch (9.1K, 3-v4-0003-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From 695b2d630d1e0812de9e3d227a56fadf21a8b61a Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:17:47 +0900
Subject: [PATCH v4 3/6] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce ExecCheckInitialPruningResults() to verify that the results
match what the worker would compute. This check helps catch
inconsistencies across leader and worker pruning logic.
While valuable on its own, this change also lays the foundation for
future optimizations where the leader may take locks only on
surviving partitions. Ensuring that workers follow identical pruning
decisions makes such selective locking safe.
---
src/backend/executor/execParallel.c | 67 +++++++++++++++++++++++++++-
src/backend/executor/execPartition.c | 35 +++++++++++++++
src/include/executor/execPartition.h | 1 +
3 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index aedbd9566d6..751590adcc9 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -65,6 +66,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -608,12 +611,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -642,6 +651,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -668,6 +679,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -769,6 +790,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1263,10 +1294,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ ExecPrep *prep = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1279,9 +1315,38 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, false, 0);
+ Assert(prep->prep_estate);
+
+ prep->prep_estate->es_part_prune_results = part_prune_results;
+ prep->prep_estate->es_unpruned_relids =
+ bms_add_members(prep->prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * Verify that the pruning results passed from the leader match
+ * what the worker would independently compute.
+ */
+ ExecCheckInitialPruningResults(prep->prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
- NULL,
+ prep,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options);
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index ac5e2ebee72..dc4eac8a0a7 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1873,6 +1873,41 @@ ExecDoInitialPruning(EState *estate)
}
}
+/*
+ * ExecCheckInitialPruningResults
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ */
+void
+ExecCheckInitialPruningResults(EState *estate)
+{
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (bms_nonempty_difference(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplns in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+}
+
/*
* ExecInitPartitionExecPruning
* Initialize the data structures needed for runtime "exec" partition
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index ba8cc594fc9..126efd008e5 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -132,6 +132,7 @@ typedef struct PartitionPruneState
extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
+extern void ExecCheckInitialPruningResults(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
int part_prune_index,
--
2.47.3
[application/octet-stream] v4-0006-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (6.7K, 4-v4-0006-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From 5dc90ce54c7108d5335003da4f247a65803e42e7 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Mon, 17 Nov 2025 17:40:26 +0900
Subject: [PATCH v4 6/6] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users, which a later
patch will use to support pruning aware locking.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 32 +++++++++++++++++++++++--
src/test/regress/expected/plancache.out | 30 +++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 27 +++++++++++++++++++++
3 files changed, 87 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index d81718ea84e..ed7352fce61 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -72,6 +72,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ ExecPrep *prep; /* ExecutorPrep() output for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -657,6 +658,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ int i;
/*
* Clean up after previous query, if there was one.
@@ -695,11 +698,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -720,9 +732,12 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ i = 0;
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ ExecPrep *prep = cprep.prep_list ? list_nth(cprep.prep_list, i++) :
+ NULL;
execution_state *newes;
/*
@@ -764,6 +779,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep = prep;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1362,8 +1378,20 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ if (es->prep)
+ {
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ EState *prep_estate = es->prep->prep_estate;
+
+ MemoryContextSetParent(prep_estate->es_query_cxt, fcache->subcontext);
+ }
+
es->qd = CreateQueryDesc(es->stmt,
- NULL,
+ es->prep,
fcache->func->src,
GetActiveSnapshot(),
InvalidSnapshot,
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 26c4c5e10fd..bf937364716 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -458,4 +458,34 @@ NOTICE: creating index on partition inval_during_pruning_p1
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index cc7eb4da4d3..71320799040 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -272,4 +272,31 @@ explain (verbose, costs off) execute inval_during_pruning_q;
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+
reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v4-0004-Use-pruning-aware-locking-in-cached-plans.patch (24.5K, 5-v4-0004-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From f3c07bcc5a14a0b751d82771c97c95775cea2758 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:30:52 +0900
Subject: [PATCH v4 4/6] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry ExecutorPrep results
through the plan caching layer. Adjust call sites in SPI,
functions, portals, and EXPLAIN to propagate this data.
This ensures pruning decisions made during initial pruning are
consistently reused without redoing pruning logic in executor paths
like parallel workers. It also lays the groundwork for
pruning-dependent lock behavior during plan reuse.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
---
src/backend/commands/prepare.c | 19 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 4 +-
src/backend/executor/spi.c | 26 ++-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 3 +
src/backend/tcop/postgres.c | 9 +-
src/backend/utils/cache/plancache.c | 234 ++++++++++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 ++
src/include/utils/plancache.h | 24 ++-
11 files changed, 313 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index afd449c73ba..23332d19b37 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ /* Keep ExecutorPrep state with the portal and its resowner. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -205,7 +209,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_list,
cplan);
/*
@@ -575,6 +579,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_list;
ListCell *p;
@@ -633,8 +638,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ /* ExecutorPrep state is local to this EXPLAIN EXECUTE call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -653,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/* Explain each query */
i = 0;
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 633310c5f5b..d81718ea84e 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -698,6 +698,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index e44f1223886..7de2328021b 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4671,8 +4671,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 7a3cb944d6f..d580f1e0425 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1579,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1659,7 +1660,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ /* ExecutorPrep state lives in this portal's context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,7 +1690,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_list, /* lives in portalContext */
cplan);
/*
@@ -2078,6 +2083,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2101,9 +2107,13 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
error_context_stack = &spierrcontext;
/* Get the generic plan for the query */
+ /* ExecutorPrep() state lives in caller's active context. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cprep);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2501,6 +2511,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ CachedPlanPrepData cprep = {0};
List *prep_list;
int i;
@@ -2577,11 +2588,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+
+ /* ExecutorPrep state is per _SPI_execute_plan call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_list = NIL;
+ prep_list = cprep.prep_list;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index c4fd646b999..4c76e78c1da 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -608,6 +608,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ccdc9bc264a..229b39060ae 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -1274,6 +1274,9 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset)
lappend_int(root->glob->resultRelations,
splan->rootRelation);
}
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels,
+ linitial_int(splan->resultRelations));
}
break;
case T_Append:
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 5880a574a06..a96419edcbe 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1639,6 +1639,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2021,7 +2022,11 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+
+ /* ExecutorPrep() state lives in portal context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2034,7 +2039,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_list,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 6661d2c6b73..c1cfd47422c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,7 +93,7 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
@@ -101,6 +101,8 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -137,6 +139,26 @@ ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
/* GUC parameter */
int plan_cache_mode = PLAN_CACHE_MODE_AUTO;
+/*
+ * Lock acquisition policy for execution locks.
+ *
+ * LOCK_ALL acquires locks on all relations mentioned in the plan,
+ * reproducing the behavior of AcquireExecutorLocks().
+ *
+ * LOCK_UNPRUNED restricts locking to only the unpruned relations. That
+ * includes those mentioned in PlannedStmt.unprunableRelids and the leaf
+ * partitions remaining after performing initial pruning.
+ */
+typedef enum LockPolicy
+{
+ LOCK_ALL,
+ LOCK_UNPRUNED,
+} LockPolicy;
+
+static void AcquireExecutorLocksWithPolicy(List *stmt_list,
+ LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep);
+
/*
* InitPlanCache: initialize module during InitPostgres.
*
@@ -938,7 +960,12 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * PrepAndCheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ *
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The resulting ExecPrep structures are saved in cprep for
+ * later reuse by ExecutorStart().
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
@@ -947,7 +974,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -975,13 +1002,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ LockPolicy policy = !cprep ? LOCK_ALL : LOCK_UNPRUNED;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, true, cprep);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1003,7 +1032,7 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
}
/*
@@ -1283,6 +1312,10 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function prepares
+ * each PlannedStmt via ExecutorPrep() and stores the results in
+ * cprep->prep_list. These are intended to be passed later to ExecutorStart().
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1293,7 +1326,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1315,7 +1349,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (PrepAndCheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1902,6 +1938,38 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
+/*
+ * AcquireExecutorLocksWithPolicy
+ * Acquire or release execution locks for a cached plan according to
+ * the specified policy.
+ *
+ * LOCK_ALL reproduces AcquireExecutorLocks(), locking every relation in
+ * each PlannedStmt's rtable. LOCK_UNPRUNED restricts locking to the
+ * unprunable rels and partitions that survive initial runtime pruning.
+ *
+ * When LOCK_UNPRUNED is used on acquire, ExecutorPrep() is invoked for
+ * each PlannedStmt and the resulting ExecPrep pointers are appended to
+ * cprep->prep_list in cprep->context. On release, the same ExecPrep
+ * list is consulted to determine which relations to unlock and is then
+ * cleaned up with ExecPrepCleanup().
+ */
+static void
+AcquireExecutorLocksWithPolicy(List *stmt_list, LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ switch (policy)
+ {
+ case LOCK_ALL:
+ AcquireExecutorLocks(stmt_list, acquire);
+ break;
+ case LOCK_UNPRUNED:
+ AcquireExecutorLocksUnpruned(stmt_list, acquire, cprep);
+ break;
+ default:
+ elog(ERROR, "invalid LockPolicy");
+ }
+}
+
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
@@ -1954,6 +2022,158 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the ExecPrep pointer for each PlannedStmt to cprep->prep_list
+ *
+ * On release, it:
+ * - looks up the ExecPrep object for each PlannedStmt from cprep->prep_list
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ * - calls ExecPrepCleanup() on each ExecPrep
+ *
+ * prep_list is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for ExecPrep happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_list;
+ int i;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the ExecPrep list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_list = cprep->prep_list;
+ Assert(acquire || list_length(prep_list) == list_length(stmt_list));
+
+ i = 0;
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ ExecPrep *prep;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_list = lappend(cprep->prep_list, NULL);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ prep = ExecutorPrep(plannedstmt, cprep->params, cprep->owner, true,
+ cprep->eflags);
+ Assert(prep || plannedstmt->partPruneInfos == NULL);
+ cprep->prep_list = lappend(cprep->prep_list, prep);
+ }
+ else
+ prep = list_nth(prep_list, i++);
+
+ Assert(prep == NULL || prep->prep_estate);
+ if (prep)
+ {
+ EState *prep_estate = prep->prep_estate;
+
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * firstResultRels may contain pruned partitions that must still be
+ * locked to satisfy executor assumptions (see comments in
+ * ExecInitModifyTable(). Ensure they’re included here.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+
+ /* Clean up prep if releasing locks. */
+ if (!acquire)
+ ExecPrepCleanup(prep);
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 46a8655621d..5af4c31f53a 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -141,6 +141,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index c4393a94321..eb211f1ba56 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -123,6 +123,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index a82b66d4bc2..c7b8ec4be39 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,27 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_list is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. The
+ * same list is later used to determine which relations to unlock when
+ * releasing execution locks.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_list; /* one ExecPrep per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate ExecPrep objects */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to pass to ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +261,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
--
2.47.3
[application/octet-stream] v4-0005-Add-test-exercising-prep-cleanup-on-cached-plan-i.patch (9.3K, 6-v4-0005-Add-test-exercising-prep-cleanup-on-cached-plan-i.patch)
download | inline diff:
From 774853b8d3c0f8d4ee1afc8329526e7d22987cab Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 20 Nov 2025 15:35:47 +0900
Subject: [PATCH v4 5/6] Add test exercising prep cleanup on cached-plan
invalidation
Add a regression test that causes a generic plan to become invalid
while pruning-aware setup is running. The pruning expression calls a
function that can perform DDL on a partition, making the plan stale
during reuse.
The test's purpose is to drive execution through the invalidation
path that discards any ExecutorPrep state created before the plan was
found invalid, providing coverage for that cleanup logic.
---
src/backend/utils/cache/plancache.c | 38 +++++++++++++--
src/test/regress/expected/plancache.out | 61 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 50 ++++++++++++++++++++
3 files changed, 144 insertions(+), 5 deletions(-)
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index c1cfd47422c..a9a4e11d1a5 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -103,6 +103,7 @@ static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -1033,6 +1034,9 @@ PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
/* Oops, the race case happened. Release useless locks. */
AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -2069,7 +2073,6 @@ LockRelids(List *rtable, Bitmapset *relids, bool acquire)
* - looks up the ExecPrep object for each PlannedStmt from cprep->prep_list
* (which must already be populated)
* - unlocks the same relations identified during acquire
- * - calls ExecPrepCleanup() on each ExecPrep
*
* prep_list is extended during acquire and must match stmt_list one-to-one
* when releasing locks. Memory allocation for ExecPrep happens in
@@ -2165,15 +2168,40 @@ AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
LockRelids(plannedstmt->rtable, lock_relids, acquire);
bms_free(lock_relids);
}
-
- /* Clean up prep if releasing locks. */
- if (!acquire)
- ExecPrepCleanup(prep);
}
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * CachedPlanPrepCleanup
+ * Clean up ExecPrep state built for a generic plan.
+ *
+ * This is used in the corner case where PrepAndCheckCachedPlan() discovers
+ * that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
+ * has already run. In that case we must both release the execution locks
+ * and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
+ * executor will never see or clean it up.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ ListCell *lc;
+
+ if (cprep == NULL)
+ return;
+
+ foreach(lc, cprep->prep_list)
+ {
+ ExecPrep *prep = (ExecPrep *) lfirst(lc);
+
+ ExecPrepCleanup(prep);
+ }
+
+ list_free(cprep->prep_list);
+ cprep->prep_list = NIL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..26c4c5e10fd 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,64 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..cc7eb4da4d3 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,53 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v4-0001-Refactor-partition-pruning-initialization-for-cla.patch (8.2K, 7-v4-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From 2d7e972bf0e772b55674d6c390682777dc8c99a3 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:18:24 +0900
Subject: [PATCH v4 1/6] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Also simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execPartition.c | 70 +++++++++++++++++-----------
src/include/executor/execPartition.h | 1 +
3 files changed, 44 insertions(+), 28 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 27c9eec697b..f5f4986383d 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -868,6 +868,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
*/
+ ExecCreatePartitionPruneStates(estate);
ExecDoInitialPruning(estate);
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 0dcce181f09..61559642662 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -182,8 +182,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1773,6 +1772,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1797,6 +1799,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1804,11 +1829,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1826,20 +1851,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -1847,8 +1865,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -1966,14 +1982,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2207,8 +2221,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2220,8 +2234,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 3b3f46aced0..ba8cc594fc9 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-03-07 09:54 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2026-03-07 09:54 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Hi,
Attached is v6 of the patch series. I've been working toward
committing this, so I wanted to lay out the ExecutorPrep() design and
the key trade-offs before doing so.
When a cached generic plan references a partitioned table,
GetCachedPlan() locks all partitions upfront via
AcquireExecutorLocks(), even those that initial pruning will
eliminate. But initial partition pruning only runs later during
ExecutorStart(). Moving pruning earlier requires some executor setup
(range table, permissions, pruning state), and ExecutorPrep() is the
vehicle for that. Unlike the approach reverted in last May, this
keeps the CachedPlan itself unchanged -- all per-execution state flows
through a separate CachedPlanPrepData that the caller provides.
The approach also keeps GetCachedPlan()'s interface
backward-compatible: the new CachedPlanPrepData argument is optional.
If a caller passes NULL, all partitions are locked as before and
nothing changes. This means existing callers and any new code that
calls GetCachedPlan() without caring about pruning-aware locking just
works.
The risk is on the other side: if a caller does pass a
CachedPlanPrepData, GetCachedPlan() will lock only the surviving
partitions and populate prep_estates with the EStates that
ExecutorPrep() created. The caller then must make those EStates
available to ExecutorStart() -- via QueryDesc->estate,
portal->prep_estates, or the equivalent path for SPI and SQL
functions. If it fails to do so, ExecutorStart() will call
ExecutorPrep() again, which may compute different pruning results than
the original call, potentially expecting locks on relations that were
never acquired. The executor would then operate on relations it
doesn't hold locks on.
So the contract is: if you opt in to pruning-aware locking by passing
CachedPlanPrepData, you must complete the pipeline by delivering the
prep EStates to the executor. In the current patch, all the call sites
that pass a CachedPlanPrepData (portals, SPI, EXECUTE, SQL functions,
EXPLAIN) do thread the EStates through correctly, and I've tried to
make the plumbing straightforward enough that it's hard to get wrong.
But it is a new invariant that didn't exist before, and a caller that
gets it wrong would fail silently rather than with an obvious error.
To catch such violations, I've added a debug-only check in
standard_ExecutorStart() that fires when no prep EState was provided.
It iterates over the plan's rtable and verifies that every lockable
relation is actually locked. It should always be true if
AcquireExecutorLocks() locked everything, but would fail if
pruning-aware locking happened upstream and the caller dropped the
prep EState. The check is skipped in parallel workers, which acquire
relation locks lazily in ExecGetRangeTableRelation().
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ true,
+ eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking
+ * should have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex =
bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
So the invariant is: if no prep EState was provided, every relation in
the plan is locked; if one was provided, at least the unpruned
relations are locked. Both are checked in assert builds.
I think this covers the main concerns, but I may be missing something.
If anyone sees a problem with this approach, I'd like to hear about
it.
--
Thanks,
Amit Langote
Attachments:
[application/octet-stream] v6-0004-Use-pruning-aware-locking-in-cached-plans.patch (37.7K, 2-v6-0004-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From 800949bf7a327a7b8bfc5b9fbcdbf0ac39106056 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:30:52 +0900
Subject: [PATCH v6 4/6] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry the EStates created by
ExecutorPrep() through the plan caching layer. The prep_estates
list is indexed one-to-one with CachedPlan->stmt_list and is
populated when GetCachedPlan() prepares a reused generic plan.
Adjust call sites in SPI, functions, portals, and EXPLAIN to
propagate this data.
Partition pruning expressions may call PL functions that require
an active snapshot (e.g., via EnsurePortalSnapshotExists()).
AcquireExecutorLocksUnpruned() establishes one before calling
ExecutorPrep() if needed, ensuring these expressions can execute
correctly during plan cache validation.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
Add a regression test that causes a generic plan to become invalid
while pruning-aware setup is running. The pruning expression calls a
function that can perform DDL on a partition, making the plan stale
during reuse.
The test's purpose is to drive execution through the invalidation
path that discards any ExecutorPrep state created before the plan was
found invalid, providing coverage for that cleanup logic.
---
src/backend/commands/prepare.c | 19 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/executor/spi.c | 26 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 20 ++
src/backend/tcop/postgres.c | 9 +-
src/backend/utils/cache/plancache.c | 292 +++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 29 +-
src/test/regress/expected/partition_prune.out | 50 ++-
src/test/regress/expected/plancache.out | 62 ++++
src/test/regress/sql/partition_prune.sql | 24 +-
src/test/regress/sql/plancache.sql | 51 +++
15 files changed, 576 insertions(+), 26 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 005fbb48aa5..e8cd47131ce 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ /* Keep ExecutorPrep state with the portal and its resowner. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -205,7 +209,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_estates,
cplan);
/*
@@ -575,6 +579,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_estates;
ListCell *p;
@@ -633,8 +638,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ /* ExecutorPrep state is local to this EXPLAIN EXECUTE call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -653,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/* Explain each query */
prep_lc = list_head(prep_estates);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index c93e2664cfd..65dfae58dcf 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -698,6 +698,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 793c76d4f82..a7a4baaf8af 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4858,8 +4858,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -4873,6 +4873,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 994a69a1c8e..13703969dd8 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1579,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1659,7 +1660,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ /* ExecutorPrep state lives in this portal's context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,7 +1690,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_estates, /* lives in portalContext */
cplan);
/*
@@ -2078,6 +2083,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2101,9 +2107,13 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
error_context_stack = &spierrcontext;
/* Get the generic plan for the query */
+ /* ExecutorPrep() state lives in caller's active context. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cprep);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2502,6 +2512,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
ListCell *lc2;
List *prep_estates;
ListCell *prep_lc;
+ CachedPlanPrepData cprep = {0};
spicallbackarg.query = plansource->query_string;
@@ -2576,11 +2587,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+
+ /* ExecutorPrep state is per _SPI_execute_plan call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..afa61d357c5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -657,6 +657,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1b5b9b5ed9c..ddb7902bc89 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,26 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of
+ * initially prunable relations. We use bms_next_member() to get
+ * the lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition
+ * expansion preserves RT index order. There is one ModifyTable
+ * per query level, so this captures exactly one entry per level.
+ * ExecInitModifyTable() asserts that the recorded index matches
+ * what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cd1e429ceed..5c145a31274 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1636,6 +1636,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2017,7 +2018,11 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+
+ /* ExecutorPrep() state lives in portal context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2030,7 +2035,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_estates,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 812e2265734..be2a961a918 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,7 +93,7 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
@@ -101,6 +101,9 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -139,6 +142,26 @@ ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
/* GUC parameter */
int plan_cache_mode = PLAN_CACHE_MODE_AUTO;
+/*
+ * Lock acquisition policy for execution locks.
+ *
+ * LOCK_ALL acquires locks on all relations mentioned in the plan,
+ * reproducing the behavior of AcquireExecutorLocks().
+ *
+ * LOCK_UNPRUNED restricts locking to only the unpruned relations. That
+ * includes those mentioned in PlannedStmt.unprunableRelids and the leaf
+ * partitions remaining after performing initial pruning.
+ */
+typedef enum LockPolicy
+{
+ LOCK_ALL,
+ LOCK_UNPRUNED,
+} LockPolicy;
+
+static void AcquireExecutorLocksWithPolicy(List *stmt_list,
+ LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep);
+
/*
* InitPlanCache: initialize module during InitPostgres.
*
@@ -940,7 +963,12 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * PrepAndCheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ *
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The EStates created to do so are saved in cprep for
+ * later reuse by ExecutorStart().
*
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
@@ -949,7 +977,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -977,13 +1005,15 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
if (plan->is_valid)
{
+ LockPolicy policy = !cprep ? LOCK_ALL : LOCK_UNPRUNED;
+
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, true, cprep);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1005,7 +1035,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -1285,6 +1318,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function prepares
+ * each PlannedStmt via ExecutorPrep() and stores the EStates in
+ * cprep->prep_estates. These are intended to be passed later to
+ * ExecutorStart().
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1295,7 +1333,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1317,7 +1356,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (PrepAndCheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1903,6 +1944,38 @@ QueryListGetPrimaryStmt(List *stmts)
return NULL;
}
+/*
+ * AcquireExecutorLocksWithPolicy
+ * Acquire or release execution locks for a cached plan according to
+ * the specified policy.
+ *
+ * LOCK_ALL reproduces AcquireExecutorLocks(), locking every relation in
+ * each PlannedStmt's rtable. LOCK_UNPRUNED restricts locking to the
+ * unprunable rels and partitions that survive initial runtime pruning.
+ *
+ * When LOCK_UNPRUNED is used on acquire, ExecutorPrep() is invoked for
+ * each PlannedStmt and the resulting EStates are appended to
+ * cprep->prep_estates in cprep->context. On release, the same EState
+ * list is consulted to determine which relations to unlock and each
+ * EState is released.
+ */
+static void
+AcquireExecutorLocksWithPolicy(List *stmt_list, LockPolicy policy, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ switch (policy)
+ {
+ case LOCK_ALL:
+ AcquireExecutorLocks(stmt_list, acquire);
+ break;
+ case LOCK_UNPRUNED:
+ AcquireExecutorLocksUnpruned(stmt_list, acquire, cprep);
+ break;
+ default:
+ elog(ERROR, "invalid LockPolicy");
+ }
+}
+
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
@@ -1955,6 +2028,211 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ if (!(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
+ elog(ERROR, "LockRelids(): cannot lock relation at RT index %d",
+ rtindex);
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the EState pointer for each PlannedStmt to cprep->prep_estates
+ *
+ * On release, it:
+ * - looks up the EState for each PlannedStmt from cprep->prep_estates
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ * - cleans up each EState
+ *
+ * prep_estates is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for EState happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_estates;
+ ListCell *prep_lc;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the EState list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_estates = cprep->prep_estates;
+ Assert(acquire || list_length(prep_estates) == list_length(stmt_list));
+
+ prep_lc = list_head(prep_estates);
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ EState *prep_estate;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_estates = lappend(cprep->prep_estates, NULL);
+ else
+ (void) next_prep_estate(prep_estates, &prep_lc);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ /*
+ * Pruning expressions may call PL functions that require an active
+ * snapshot (e.g., via EnsurePortalSnapshotExists()). Establish one
+ * if needed.
+ */
+ bool snap_pushed = false;
+
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snap_pushed = true;
+ }
+
+ prep_estate = ExecutorPrep(plannedstmt, cprep->params, cprep->owner, true,
+ cprep->eflags);
+ Assert(prep_estate);
+ cprep->prep_estates = lappend(cprep->prep_estates, prep_estate);
+
+ if (snap_pushed)
+ PopActiveSnapshot();
+ }
+ else
+ prep_estate = next_prep_estate(prep_estates, &prep_lc);
+
+ if (prep_estate)
+ {
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * We must always include the first result relation of each
+ * ModifyTable node in the plan, that is, the one mentioned in
+ * plannedstmt->firstResultRels in the set of relations to be
+ * locked to satisfy executor assumptions described
+ * in ExecInitModifyTable(). This can be wasteful, because we
+ * may not need to use the first result relation at all if other
+ * result relations are unpruned and thus sufficient for the
+ * ModifyTable node's needs. Unfortunately, we don't have per-node
+ * unpruned_relids set to determine that other result relations
+ * are included.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * CachedPlanPrepCleanup
+ * Clean up EState built for a generic plan.
+ *
+ * This is used in the corner case where PrepAndCheckCachedPlan() discovers
+ * that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
+ * has already run. In that case we must both release the execution locks
+ * and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
+ * executor will never see or clean it up.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ ListCell *lc;
+
+ if (cprep == NULL)
+ return;
+
+ foreach(lc, cprep->prep_estates)
+ {
+ EState *prep_estate = (EState *) lfirst(lc);
+
+ if (prep_estate == NULL)
+ continue;
+
+ ExecCloseRangeTableRelations(prep_estate);
+ FreeExecutorState(prep_estate);
+ }
+
+ list_free(cprep->prep_estates);
+ cprep->prep_estates = NIL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c175ee95b68..989b3c73691 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 8c9321aab8c..1431f12a6e8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -123,6 +123,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 984c51515c6..da3ce9f3177 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,32 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_estates is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. If the
+ * plan is found invalid after locking, the same list is used to determine
+ * which relations to unlock before retrying.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ *
+ * eflags controls ExecutorPrep() behavior during initial pruning.
+ * Normally zero; set EXEC_FLAG_EXPLAIN_GENERIC to suppress pruning
+ * in EXPLAIN (GENERIC_PLAN). Need not match the eflags later passed
+ * to ExecutorStart().
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_estates; /* one EState per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate EState and its fields */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to control ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +266,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 39dab8fcc05..39770f3b6d6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4860,9 +4860,7 @@ select c.relname
relname
--------------
prunelock_p1
- prunelock_p2
- prunelock_p3
-(3 rows)
+(1 row)
commit;
deallocate prunelock_q;
@@ -4904,6 +4902,50 @@ select c.relname
commit;
deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+execute prunelock_mt_q(4, 5);
+deallocate prunelock_mt_q;
drop table prunelock_p;
reset plan_cache_mode;
-reset enable_partition_pruning;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..1d69ab0a1c2 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,65 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 229c5eb370c..87672ad40f7 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1499,6 +1499,28 @@ select c.relname
commit;
deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
drop table prunelock_p;
reset plan_cache_mode;
-reset enable_partition_pruning;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..139b4688fd6 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,54 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v6-0003-Add-test-for-partition-lock-behavior-with-generic.patch (5.3K, 3-v6-0003-Add-test-for-partition-lock-behavior-with-generic.patch)
download | inline diff:
From 58179bd0d3730dbd1fdbb0bd9c624dc7ae770830 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:00:32 +0900
Subject: [PATCH v6 3/6] Add test for partition lock behavior with generic
cached plans
Add a regression test that inspects pg_locks to verify which child
partitions are locked when executing a prepared statement that uses
a generic cached plan.
Two cases are tested: one with enable_partition_pruning on and one
with it off. Currently both cases lock all child partitions, because
GetCachedPlan() acquires execution locks on every relation in the
plan regardless of pruning.
A subsequent commit that adds pruning-aware locking will update the
expected output for the pruning-enabled case, showing that only the
surviving partition is locked.
---
src/test/regress/expected/partition_prune.out | 83 +++++++++++++++++++
src/test/regress/sql/partition_prune.sql | 55 ++++++++++++
2 files changed, 138 insertions(+)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..39dab8fcc05 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4824,3 +4824,86 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+drop table prunelock_p;
+reset plan_cache_mode;
+reset enable_partition_pruning;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..229c5eb370c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1447,3 +1447,58 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+drop table prunelock_p;
+reset plan_cache_mode;
+reset enable_partition_pruning;
--
2.47.3
[application/octet-stream] v6-0006-Reuse-partition-pruning-results-in-parallel-worke.patch (15.9K, 4-v6-0006-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From dc2cfc32410792b3f00422c07623f989901ee34b Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:17:47 +0900
Subject: [PATCH v6 6/6] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce CheckInitialPruningResultsInWorker() (debug-builds only)
to verify that the results match what the worker would compute. This
check helps catch inconsistencies across leader and worker pruning
logic.
---
src/backend/executor/execParallel.c | 108 +++++++++++++++++++++++++++-
src/backend/utils/cache/plancache.c | 95 +++++++-----------------
2 files changed, 133 insertions(+), 70 deletions(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 024780d3516..d337bf8c081 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -67,6 +68,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -141,6 +144,8 @@ static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
/* Helper function that runs in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static void CheckInitialPruningResultsInWorker(EState *estate);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -620,12 +625,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -654,6 +665,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -680,6 +693,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -781,6 +804,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1280,10 +1313,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ EState *prep_estate = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1296,12 +1334,80 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep_estate = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, false, 0);
+ Assert(prep_estate);
+
+ prep_estate->es_part_prune_results = part_prune_results;
+ prep_estate->es_unpruned_relids =
+ bms_add_members(prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * A debug-build-only check that the pruning results passed from the
+ * leader match what the worker would independently compute.
+ */
+ CheckInitialPruningResultsInWorker(prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options,
- NULL);
+ prep_estate);
+}
+
+/*
+ * CheckInitialPruningResultsInWorker
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ *
+ * Only performed in debug builds.
+ */
+static void
+CheckInitialPruningResultsInWorker(EState *estate)
+{
+#ifdef USE_ASSERT_CHECKING
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i++);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (!bms_equal(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplans in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+#endif
}
/*
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index be2a961a918..1d3244307da 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,14 +93,14 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
+static bool CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksAll(List *stmt_list, bool acquire);
static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
CachedPlanPrepData *cprep);
static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
@@ -142,26 +142,6 @@ ResourceOwnerForgetPlanCacheRef(ResourceOwner owner, CachedPlan *plan)
/* GUC parameter */
int plan_cache_mode = PLAN_CACHE_MODE_AUTO;
-/*
- * Lock acquisition policy for execution locks.
- *
- * LOCK_ALL acquires locks on all relations mentioned in the plan,
- * reproducing the behavior of AcquireExecutorLocks().
- *
- * LOCK_UNPRUNED restricts locking to only the unpruned relations. That
- * includes those mentioned in PlannedStmt.unprunableRelids and the leaf
- * partitions remaining after performing initial pruning.
- */
-typedef enum LockPolicy
-{
- LOCK_ALL,
- LOCK_UNPRUNED,
-} LockPolicy;
-
-static void AcquireExecutorLocksWithPolicy(List *stmt_list,
- LockPolicy policy, bool acquire,
- CachedPlanPrepData *cprep);
-
/*
* InitPlanCache: initialize module during InitPostgres.
*
@@ -963,7 +943,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
}
/*
- * PrepAndCheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
+ * CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
* If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
* compute the set of partitions that survive initial runtime pruning in order
@@ -977,7 +957,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
+CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -1005,15 +985,16 @@ PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
*/
if (plan->is_valid)
{
- LockPolicy policy = !cprep ? LOCK_ALL : LOCK_UNPRUNED;
-
/*
* Plan must have positive refcount because it is referenced by
* plansource; so no need to fear it disappears under us here.
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, true, cprep);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, true, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1035,7 +1016,10 @@ PrepAndCheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocksWithPolicy(plan->stmt_list, policy, false, cprep);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, false, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, false);
/* Also clean up ExecutorPrep() state, if necessary. */
CachedPlanPrepCleanup(cprep);
@@ -1358,7 +1342,7 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
{
if (cprep)
cprep->params = boundParams;
- if (PrepAndCheckCachedPlan(plansource, cprep))
+ if (CheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1945,43 +1929,13 @@ QueryListGetPrimaryStmt(List *stmts)
}
/*
- * AcquireExecutorLocksWithPolicy
- * Acquire or release execution locks for a cached plan according to
- * the specified policy.
- *
- * LOCK_ALL reproduces AcquireExecutorLocks(), locking every relation in
- * each PlannedStmt's rtable. LOCK_UNPRUNED restricts locking to the
- * unprunable rels and partitions that survive initial runtime pruning.
- *
- * When LOCK_UNPRUNED is used on acquire, ExecutorPrep() is invoked for
- * each PlannedStmt and the resulting EStates are appended to
- * cprep->prep_estates in cprep->context. On release, the same EState
- * list is consulted to determine which relations to unlock and each
- * EState is released.
- */
-static void
-AcquireExecutorLocksWithPolicy(List *stmt_list, LockPolicy policy, bool acquire,
- CachedPlanPrepData *cprep)
-{
- switch (policy)
- {
- case LOCK_ALL:
- AcquireExecutorLocks(stmt_list, acquire);
- break;
- case LOCK_UNPRUNED:
- AcquireExecutorLocksUnpruned(stmt_list, acquire, cprep);
- break;
- default:
- elog(ERROR, "invalid LockPolicy");
- }
-}
-
-/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ * AcquireExecutorLocksAll: acquire locks needed for execution of a cached
+ * plan; or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksAll(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -2044,10 +1998,8 @@ LockRelids(List *rtable, Bitmapset *relids, bool acquire)
{
RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
- if (!(rte->rtekind == RTE_RELATION ||
- (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid))))
- elog(ERROR, "LockRelids(): cannot lock relation at RT index %d",
- rtindex);
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
/*
* Acquire the appropriate type of lock on each relation OID. Note
@@ -2204,7 +2156,7 @@ AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
* CachedPlanPrepCleanup
* Clean up EState built for a generic plan.
*
- * This is used in the corner case where PrepAndCheckCachedPlan() discovers
+ * This is used in the corner case where CheckCachedPlan() discovers
* that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
* has already run. In that case we must both release the execution locks
* and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
@@ -2214,10 +2166,14 @@ static void
CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
{
ListCell *lc;
+ ResourceOwner oldowner;
if (cprep == NULL)
return;
+ /* Switch to owner that ExecutorPrep() would have used. */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = cprep->owner;
foreach(lc, cprep->prep_estates)
{
EState *prep_estate = (EState *) lfirst(lc);
@@ -2228,6 +2184,7 @@ CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
ExecCloseRangeTableRelations(prep_estate);
FreeExecutorState(prep_estate);
}
+ CurrentResourceOwner = oldowner;
list_free(cprep->prep_estates);
cprep->prep_estates = NIL;
--
2.47.3
[application/octet-stream] v6-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (7.8K, 5-v6-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From 836f0b63ced2546b594643043b7d0055ffaa7b66 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:09:23 +0900
Subject: [PATCH v6 5/6] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 29 +++++++++++++--
src/test/regress/expected/plancache.out | 48 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 34 ++++++++++++++++++
3 files changed, 109 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 65dfae58dcf..c70e06d8886 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -72,6 +72,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ EState *prep_estate; /* EState created in ExecutorPrep() for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -657,6 +658,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ ListCell *prep_lc;
/*
* Clean up after previous query, if there was one.
@@ -695,11 +698,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -720,9 +732,11 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ prep_lc = list_head(cprep.prep_estates);
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ EState *prep_estate = next_prep_estate(cprep.prep_estates, &prep_lc);
execution_state *newes;
/*
@@ -764,6 +778,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep_estate = prep_estate;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1362,6 +1377,15 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ if (es->prep_estate)
+ MemoryContextSetParent(es->prep_estate->es_query_cxt,
+ fcache->subcontext);
+
es->qd = CreateQueryDesc(es->stmt,
fcache->func->src,
GetActiveSnapshot(),
@@ -1370,7 +1394,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
0,
- NULL);
+ es->prep_estate);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
@@ -1461,6 +1485,7 @@ postquel_end(execution_state *es, SQLFunctionCachePtr fcache)
FreeQueryDesc(es->qd);
es->qd = NULL;
+ es->prep_estate = NULL;
MemoryContextSwitchTo(oldcontext);
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 1d69ab0a1c2..371673a6e96 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -459,4 +459,52 @@ NOTICE: creating index on partition inval_during_pruning_p1
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select * from sqlf_base order by 1;
+ id | val
+----+-----
+ 1 | 30
+(1 row)
+
+select * from sqlf_log order by 1;
+ id | note
+----+----------------
+ 1 | logged by rule
+ 1 | logged by rule
+(2 rows)
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 139b4688fd6..b89c9ad69a4 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -273,4 +273,38 @@ drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+select * from sqlf_base order by 1;
+select * from sqlf_log order by 1;
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v6-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (27.6K, 6-v6-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From aeaaa5059a7be06c301b1372c16829225b2770fb Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v6 2/6] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorPrep() builds an EState containing the executor
metadata needed before plan execution, including partition
pruning state where partPruneInfos are present, and returns it
directly to the caller.
ExecutorStart() now checks if QueryDesc->estate is already set
(indicating ExecutorPrep() was called earlier). If so, it reuses
the EState to avoid redoing range table setup and pruning.
Otherwise, it invokes ExecutorPrep() itself and adopts the
resulting EState for the duration of the query. This keeps the
executor startup behavior unchanged while making the setup work
callable separately when needed.
CreateQueryDesc() grows a prep_estate argument to accept an
optionally pre-created EState and stores it in the QueryDesc.
Portals, SPI, SQL functions, and EXPLAIN are wired to carry
optional EState pointers alongside the PlannedStmt list, but most
callers still pass NULL and let ExecutorStart() perform the setup
lazily.
ExecutorPrep() requires the caller to have established an active
snapshot, as partition pruning expressions may call PL functions
that internally require one (e.g., via EnsurePortalSnapshotExists()).
Update executor/README and related comments to document the new
control flow and the separation between preparation and execution.
Note that as of this commit, ExecutorStart() is the only caller of
ExecutorPrep(), so there is no semantic change in behavior. Later
commits will add specialized callers that invoke ExecutorPrep()
earlier to enable pruning-aware locking in cached plans.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 8 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/executor/README | 11 +-
src/backend/executor/execMain.c | 176 +++++++++++++++++++++++-----
src/backend/executor/execParallel.c | 3 +-
src/backend/executor/functions.c | 3 +-
src/backend/executor/spi.c | 9 +-
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 24 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 5 +-
src/include/executor/executor.h | 26 ++++
src/include/nodes/execnodes.h | 1 -
src/include/utils/portal.h | 2 +
20 files changed, 241 insertions(+), 52 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9ceeff6d99e..ef1ee2568c6 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -875,7 +875,7 @@ BeginCopyTo(ParseState *pstate,
cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/*
* Call ExecutorStart to prepare the plan for execution.
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 270e9bf3110..b4a9808955a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -336,7 +336,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 93918a223b8..40564d4dff9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -370,7 +370,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,7 +492,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -550,7 +551,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* Create a QueryDesc for the query */
queryDesc = CreateQueryDesc(plannedstmt, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ dest, params, queryEnv, instrument_option,
+ prep_estate);
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 963618a64c4..ff759ddd07c 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -1173,7 +1173,7 @@ execute_sql_string(const char *sql, const char *filename)
qdesc = CreateQueryDesc(stmt,
sql,
GetActiveSnapshot(), NULL,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..2cdfdcf984b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -439,7 +439,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..1e880a6d7c9 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NIL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 5b86a727587..005fbb48aa5 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,6 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -575,7 +576,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_estates;
ListCell *p;
+ ListCell *prep_lc;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -650,14 +653,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_estates = NIL;
/* Explain each query */
+ prep_lc = list_head(prep_estates);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep_estate,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..d749ceb6687 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,18 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Creates EState,
+ performs range table initialization, permission checks, and initial
+ partition pruning. Returns the EState that ExecutorStart() should
+ reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if not already done, indicated by NULL QueryDesc.estate)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 654f9246ad0..d7e99690c7f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -55,6 +55,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -145,7 +146,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -171,9 +171,71 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When
+ * no prep EState was provided, AcquireExecutorLocks() should have
+ * locked every relation in the plan. When one was provided,
+ * pruning-aware locking should have locked at least the unpruned
+ * relations. Both checks are skipped in parallel workers, which
+ * acquire relation locks lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ true,
+ eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking
+ * should have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -263,6 +325,84 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true.
+ *
+ * This is intended for callers that need executor metadata ahead of actual
+ * execution. Typical use cases include:
+ * - determining which relations must be locked during plan cache validation;
+ * - initializing unpruned relids and valid subplans in parallel workers
+ * using state copied from the leader.
+ *
+ * The executor can reuse the resulting state to avoid redundant setup during
+ * ExecutorStart().
+ *
+ * Returns an EState that can be reused later.
+ */
+EState *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ bool do_initial_pruning, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Caller must have established an active snapshot. */
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to compute
+ * the subset of child subplans that will be executed. The results,
+ * which are bitmapsets of selected child indexes, are saved in
+ * es_part_prune_results. This list is parallel to es_part_prune_infos.
+ *
+ * In parallel workers, do_initial_pruning should be false -- they receive
+ * es_part_prune_results from the leader process and should only initialize
+ * the PartitionPruneStates.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ return estate;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -838,38 +978,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecCreatePartitionPruneStates(estate);
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..024780d3516 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1300,7 +1300,8 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
- receiver, paramLI, NULL, instrument_options);
+ receiver, paramLI, NULL, instrument_options,
+ NULL);
}
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 4ca342a43ef..c93e2664cfd 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1368,7 +1368,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest,
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
- 0);
+ 0,
+ NULL);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 3019a3b2b97..994a69a1c8e 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1685,6 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2499,6 +2500,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_estates;
+ ListCell *prep_lc;
spicallbackarg.query = plansource->query_string;
@@ -2577,6 +2580,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_estates = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2614,9 +2618,11 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ prep_lc = list_head(prep_estates);
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2694,7 +2700,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
dest,
options->params,
_SPI_current->queryEnv,
- 0);
+ 0,
+ prep_estate);
res = _SPI_pquery(qdesc, fire_triggers,
canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d01a09dd0c4..cd1e429ceed 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1230,6 +1230,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2029,6 +2030,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index d8fc75d0bb9..b18266487bb 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -72,7 +73,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options)
+ int instrument_options,
+ EState *prep_estate)
{
QueryDesc *qd = palloc_object(QueryDesc);
@@ -93,6 +95,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->planstate = NULL;
qd->totaltime = NULL;
+ /* Use the EState created by ExecutorPrep() if already done. */
+ qd->estate = prep_estate;
+
/* not yet executed */
qd->already_executed = false;
@@ -123,6 +128,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep_estate: EState created in ExecutorPrep() for the query, if any
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +141,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -148,7 +155,8 @@ ProcessQuery(PlannedStmt *plan,
*/
queryDesc = CreateQueryDesc(plan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0,
+ prep_estate);
/*
* Call ExecutorStart to prepare the plan for execution
@@ -495,7 +503,10 @@ PortalStart(Portal portal, ParamListInfo params,
None_Receiver,
params,
portal->queryEnv,
- 0);
+ 0,
+ portal->prep_estates ?
+ (EState *) linitial(portal->prep_estates) :
+ NULL);
/*
* If it's a scrollable cursor, executor needs to support
@@ -1185,6 +1196,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ ListCell *prep_lc;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1217,11 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ prep_lc = list_head(portal->prep_estates);
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ EState *prep_estate = next_prep_estate(portal->prep_estates, &prep_lc);
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1279,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1288,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index c1a53e658cb..941e95010c3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -297,6 +298,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->prep_estates = prep_estates;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 86226f8db70..3756a11345f 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,7 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index d3a57242844..3a2169c9613 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -43,7 +43,7 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
@@ -63,7 +63,8 @@ extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options);
+ int instrument_options,
+ EState *prep_estate);
extern void FreeQueryDesc(QueryDesc *qdesc);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..e6fa122e6e4 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -20,6 +20,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -234,6 +235,31 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern EState *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ bool do_initial_pruning,
+ int eflags);
+
+/*
+ * Walk a prep_estates list in step with a parallel stmt_list iteration.
+ * Returns the next EState (or NULL) and advances *lc. Safe when
+ * prep_estates is NIL; just returns NULL for every call.
+ */
+static inline EState *
+next_prep_estate(List *prep_estates, ListCell **lc)
+{
+ EState *result = NULL;
+
+ if (*lc != NULL)
+ {
+ result = (EState *) lfirst(*lc);
+ *lc = lnext(prep_estates, *lc);
+ }
+ return result;
+}
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..84d80e3ab0d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -775,7 +775,6 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
-
/*
* ExecRowMark -
* runtime representation of FOR [KEY] UPDATE/SHARE clauses
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..f69b4b9b479 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *prep_estates; /* list of EStates where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v6-0001-Refactor-partition-pruning-initialization-for-cla.patch (10.2K, 7-v6-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From 6f2c9cc7a30d38cb2606595f62b62c77e2aba6e9 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 15:08:52 +0900
Subject: [PATCH v6 1/6] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called at a time when the PARAM_EXEC
parameters have not yet been set up.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execPartition.c | 103 +++++++++++++++++++--------
src/include/executor/execPartition.h | 1 +
3 files changed, 74 insertions(+), 31 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..654f9246ad0 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -868,6 +868,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
*/
+ ExecCreatePartitionPruneStates(estate);
ExecDoInitialPruning(estate);
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index bab294f5e91..20c3513fabe 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -184,8 +184,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1942,6 +1941,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1966,6 +1968,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1973,11 +1998,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1995,20 +2020,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -2016,8 +2034,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2135,14 +2151,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2376,8 +2390,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2389,10 +2403,29 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
+ }
+#endif
+ }
j++;
}
@@ -2489,9 +2522,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2519,9 +2553,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 82063ec2a16..4c96808c376 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-03-09 04:41 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2026-03-09 04:41 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Sat, Mar 7, 2026 at 6:54 PM Amit Langote <[email protected]> wrote:
> Attached is v6 of the patch series. I've been working toward
> committing this, so I wanted to lay out the ExecutorPrep() design and
> the key trade-offs before doing so.
>
> When a cached generic plan references a partitioned table,
> GetCachedPlan() locks all partitions upfront via
> AcquireExecutorLocks(), even those that initial pruning will
> eliminate. But initial partition pruning only runs later during
> ExecutorStart(). Moving pruning earlier requires some executor setup
> (range table, permissions, pruning state), and ExecutorPrep() is the
> vehicle for that. Unlike the approach reverted in last May, this
> keeps the CachedPlan itself unchanged -- all per-execution state flows
> through a separate CachedPlanPrepData that the caller provides.
>
> The approach also keeps GetCachedPlan()'s interface
> backward-compatible: the new CachedPlanPrepData argument is optional.
> If a caller passes NULL, all partitions are locked as before and
> nothing changes. This means existing callers and any new code that
> calls GetCachedPlan() without caring about pruning-aware locking just
> works.
>
> The risk is on the other side: if a caller does pass a
> CachedPlanPrepData, GetCachedPlan() will lock only the surviving
> partitions and populate prep_estates with the EStates that
> ExecutorPrep() created. The caller then must make those EStates
> available to ExecutorStart() -- via QueryDesc->estate,
> portal->prep_estates, or the equivalent path for SPI and SQL
> functions. If it fails to do so, ExecutorStart() will call
> ExecutorPrep() again, which may compute different pruning results than
> the original call, potentially expecting locks on relations that were
> never acquired. The executor would then operate on relations it
> doesn't hold locks on.
>
> So the contract is: if you opt in to pruning-aware locking by passing
> CachedPlanPrepData, you must complete the pipeline by delivering the
> prep EStates to the executor. In the current patch, all the call sites
> that pass a CachedPlanPrepData (portals, SPI, EXECUTE, SQL functions,
> EXPLAIN) do thread the EStates through correctly, and I've tried to
> make the plumbing straightforward enough that it's hard to get wrong.
> But it is a new invariant that didn't exist before, and a caller that
> gets it wrong would fail silently rather than with an obvious error.
>
> To catch such violations, I've added a debug-only check in
> standard_ExecutorStart() that fires when no prep EState was provided.
> It iterates over the plan's rtable and verifies that every lockable
> relation is actually locked. It should always be true if
> AcquireExecutorLocks() locked everything, but would fail if
> pruning-aware locking happened upstream and the caller dropped the
> prep EState. The check is skipped in parallel workers, which acquire
> relation locks lazily in ExecGetRangeTableRelation().
>
> + if (queryDesc->estate == NULL)
> + {
> +#ifdef USE_ASSERT_CHECKING
> + if (!IsParallelWorker())
> + {
> + ListCell *lc;
> +
> + foreach(lc, queryDesc->plannedstmt->rtable)
> + {
> + RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
> +
> + if (rte->rtekind == RTE_RELATION ||
> + (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
> + Assert(CheckRelationOidLockedByMe(rte->relid,
> + rte->rellockmode,
> + true));
> + }
> + }
> +#endif
> + queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
> + queryDesc->params,
> + CurrentResourceOwner,
> + true,
> + eflags);
> + }
> +#ifdef USE_ASSERT_CHECKING
> + else
> + {
> + /*
> + * A prep EState was provided, meaning pruning-aware locking
> + * should have locked at least the unpruned relations.
> + */
> + if (!IsParallelWorker())
> + {
> + int rtindex = -1;
> +
> + while ((rtindex =
> bms_next_member(queryDesc->estate->es_unpruned_relids,
> + rtindex)) >= 0)
> + {
> + RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
> +
> + Assert(rte->rtekind == RTE_RELATION ||
> + (rte->rtekind == RTE_SUBQUERY &&
> + rte->relid != InvalidOid));
> + Assert(CheckRelationOidLockedByMe(rte->relid,
> + rte->rellockmode, true));
> + }
> + }
> + }
> +#endif
>
> So the invariant is: if no prep EState was provided, every relation in
> the plan is locked; if one was provided, at least the unpruned
> relations are locked. Both are checked in assert builds.
>
> I think this covers the main concerns, but I may be missing something.
> If anyone sees a problem with this approach, I'd like to hear about
> it.
Here's v7. Some plancache.c changes that I'd made were in the wrong
patch in v6; this version puts them where they belong.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v7-0003-Add-test-for-partition-lock-behavior-with-generic.patch (5.3K, 2-v7-0003-Add-test-for-partition-lock-behavior-with-generic.patch)
download | inline diff:
From 58179bd0d3730dbd1fdbb0bd9c624dc7ae770830 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:00:32 +0900
Subject: [PATCH v7 3/6] Add test for partition lock behavior with generic
cached plans
Add a regression test that inspects pg_locks to verify which child
partitions are locked when executing a prepared statement that uses
a generic cached plan.
Two cases are tested: one with enable_partition_pruning on and one
with it off. Currently both cases lock all child partitions, because
GetCachedPlan() acquires execution locks on every relation in the
plan regardless of pruning.
A subsequent commit that adds pruning-aware locking will update the
expected output for the pruning-enabled case, showing that only the
surviving partition is locked.
---
src/test/regress/expected/partition_prune.out | 83 +++++++++++++++++++
src/test/regress/sql/partition_prune.sql | 55 ++++++++++++
2 files changed, 138 insertions(+)
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..39dab8fcc05 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4824,3 +4824,86 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+drop table prunelock_p;
+reset plan_cache_mode;
+reset enable_partition_pruning;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..229c5eb370c 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1447,3 +1447,58 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+drop table prunelock_p;
+reset plan_cache_mode;
+reset enable_partition_pruning;
--
2.47.3
[application/octet-stream] v7-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (7.8K, 3-v7-0005-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From c67ec5cc6bbe20d7ad14fb99cd1696939c6ec70f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:09:23 +0900
Subject: [PATCH v7 5/6] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 29 +++++++++++++--
src/test/regress/expected/plancache.out | 48 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 34 ++++++++++++++++++
3 files changed, 109 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 65dfae58dcf..c70e06d8886 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -72,6 +72,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ EState *prep_estate; /* EState created in ExecutorPrep() for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -657,6 +658,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ ListCell *prep_lc;
/*
* Clean up after previous query, if there was one.
@@ -695,11 +698,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -720,9 +732,11 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ prep_lc = list_head(cprep.prep_estates);
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ EState *prep_estate = next_prep_estate(cprep.prep_estates, &prep_lc);
execution_state *newes;
/*
@@ -764,6 +778,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep_estate = prep_estate;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1362,6 +1377,15 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ if (es->prep_estate)
+ MemoryContextSetParent(es->prep_estate->es_query_cxt,
+ fcache->subcontext);
+
es->qd = CreateQueryDesc(es->stmt,
fcache->func->src,
GetActiveSnapshot(),
@@ -1370,7 +1394,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
0,
- NULL);
+ es->prep_estate);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
@@ -1461,6 +1485,7 @@ postquel_end(execution_state *es, SQLFunctionCachePtr fcache)
FreeQueryDesc(es->qd);
es->qd = NULL;
+ es->prep_estate = NULL;
MemoryContextSwitchTo(oldcontext);
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 1d69ab0a1c2..371673a6e96 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -459,4 +459,52 @@ NOTICE: creating index on partition inval_during_pruning_p1
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select * from sqlf_base order by 1;
+ id | val
+----+-----
+ 1 | 30
+(1 row)
+
+select * from sqlf_log order by 1;
+ id | note
+----+----------------
+ 1 | logged by rule
+ 1 | logged by rule
+(2 rows)
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 139b4688fd6..b89c9ad69a4 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -273,4 +273,38 @@ drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+select * from sqlf_base order by 1;
+select * from sqlf_log order by 1;
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v7-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (27.6K, 4-v7-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From aeaaa5059a7be06c301b1372c16829225b2770fb Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v7 2/6] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorPrep() builds an EState containing the executor
metadata needed before plan execution, including partition
pruning state where partPruneInfos are present, and returns it
directly to the caller.
ExecutorStart() now checks if QueryDesc->estate is already set
(indicating ExecutorPrep() was called earlier). If so, it reuses
the EState to avoid redoing range table setup and pruning.
Otherwise, it invokes ExecutorPrep() itself and adopts the
resulting EState for the duration of the query. This keeps the
executor startup behavior unchanged while making the setup work
callable separately when needed.
CreateQueryDesc() grows a prep_estate argument to accept an
optionally pre-created EState and stores it in the QueryDesc.
Portals, SPI, SQL functions, and EXPLAIN are wired to carry
optional EState pointers alongside the PlannedStmt list, but most
callers still pass NULL and let ExecutorStart() perform the setup
lazily.
ExecutorPrep() requires the caller to have established an active
snapshot, as partition pruning expressions may call PL functions
that internally require one (e.g., via EnsurePortalSnapshotExists()).
Update executor/README and related comments to document the new
control flow and the separation between preparation and execution.
Note that as of this commit, ExecutorStart() is the only caller of
ExecutorPrep(), so there is no semantic change in behavior. Later
commits will add specialized callers that invoke ExecutorPrep()
earlier to enable pruning-aware locking in cached plans.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 8 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/executor/README | 11 +-
src/backend/executor/execMain.c | 176 +++++++++++++++++++++++-----
src/backend/executor/execParallel.c | 3 +-
src/backend/executor/functions.c | 3 +-
src/backend/executor/spi.c | 9 +-
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 24 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 5 +-
src/include/executor/executor.h | 26 ++++
src/include/nodes/execnodes.h | 1 -
src/include/utils/portal.h | 2 +
20 files changed, 241 insertions(+), 52 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 9ceeff6d99e..ef1ee2568c6 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -875,7 +875,7 @@ BeginCopyTo(ParseState *pstate,
cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/*
* Call ExecutorStart to prepare the plan for execution.
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 270e9bf3110..b4a9808955a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -336,7 +336,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 93918a223b8..40564d4dff9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -370,7 +370,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -492,7 +492,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -550,7 +551,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* Create a QueryDesc for the query */
queryDesc = CreateQueryDesc(plannedstmt, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ dest, params, queryEnv, instrument_option,
+ prep_estate);
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 963618a64c4..ff759ddd07c 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -1173,7 +1173,7 @@ execute_sql_string(const char *sql, const char *filename)
qdesc = CreateQueryDesc(stmt,
sql,
GetActiveSnapshot(), NULL,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..2cdfdcf984b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -439,7 +439,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..1e880a6d7c9 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NIL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 5b86a727587..005fbb48aa5 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -205,6 +205,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -575,7 +576,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_estates;
ListCell *p;
+ ListCell *prep_lc;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -650,14 +653,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_estates = NIL;
/* Explain each query */
+ prep_lc = list_head(prep_estates);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep_estate,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..d749ceb6687 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,18 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Creates EState,
+ performs range table initialization, permission checks, and initial
+ partition pruning. Returns the EState that ExecutorStart() should
+ reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if not already done, indicated by NULL QueryDesc.estate)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 654f9246ad0..d7e99690c7f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -55,6 +55,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -145,7 +146,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -171,9 +171,71 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When
+ * no prep EState was provided, AcquireExecutorLocks() should have
+ * locked every relation in the plan. When one was provided,
+ * pruning-aware locking should have locked at least the unpruned
+ * relations. Both checks are skipped in parallel workers, which
+ * acquire relation locks lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ true,
+ eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking
+ * should have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -263,6 +325,84 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true.
+ *
+ * This is intended for callers that need executor metadata ahead of actual
+ * execution. Typical use cases include:
+ * - determining which relations must be locked during plan cache validation;
+ * - initializing unpruned relids and valid subplans in parallel workers
+ * using state copied from the leader.
+ *
+ * The executor can reuse the resulting state to avoid redundant setup during
+ * ExecutorStart().
+ *
+ * Returns an EState that can be reused later.
+ */
+EState *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ bool do_initial_pruning, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Caller must have established an active snapshot. */
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to compute
+ * the subset of child subplans that will be executed. The results,
+ * which are bitmapsets of selected child indexes, are saved in
+ * es_part_prune_results. This list is parallel to es_part_prune_infos.
+ *
+ * In parallel workers, do_initial_pruning should be false -- they receive
+ * es_part_prune_results from the leader process and should only initialize
+ * the PartitionPruneStates.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ return estate;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -838,38 +978,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecCreatePartitionPruneStates(estate);
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..024780d3516 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1300,7 +1300,8 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
- receiver, paramLI, NULL, instrument_options);
+ receiver, paramLI, NULL, instrument_options,
+ NULL);
}
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 4ca342a43ef..c93e2664cfd 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1368,7 +1368,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest,
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
- 0);
+ 0,
+ NULL);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 3019a3b2b97..994a69a1c8e 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1685,6 +1685,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2499,6 +2500,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_estates;
+ ListCell *prep_lc;
spicallbackarg.query = plansource->query_string;
@@ -2577,6 +2580,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_estates = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2614,9 +2618,11 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ prep_lc = list_head(prep_estates);
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2694,7 +2700,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
dest,
options->params,
_SPI_current->queryEnv,
- 0);
+ 0,
+ prep_estate);
res = _SPI_pquery(qdesc, fire_triggers,
canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index d01a09dd0c4..cd1e429ceed 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1230,6 +1230,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2029,6 +2030,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index d8fc75d0bb9..b18266487bb 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -72,7 +73,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options)
+ int instrument_options,
+ EState *prep_estate)
{
QueryDesc *qd = palloc_object(QueryDesc);
@@ -93,6 +95,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->planstate = NULL;
qd->totaltime = NULL;
+ /* Use the EState created by ExecutorPrep() if already done. */
+ qd->estate = prep_estate;
+
/* not yet executed */
qd->already_executed = false;
@@ -123,6 +128,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep_estate: EState created in ExecutorPrep() for the query, if any
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +141,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -148,7 +155,8 @@ ProcessQuery(PlannedStmt *plan,
*/
queryDesc = CreateQueryDesc(plan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0,
+ prep_estate);
/*
* Call ExecutorStart to prepare the plan for execution
@@ -495,7 +503,10 @@ PortalStart(Portal portal, ParamListInfo params,
None_Receiver,
params,
portal->queryEnv,
- 0);
+ 0,
+ portal->prep_estates ?
+ (EState *) linitial(portal->prep_estates) :
+ NULL);
/*
* If it's a scrollable cursor, executor needs to support
@@ -1185,6 +1196,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ ListCell *prep_lc;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1217,11 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ prep_lc = list_head(portal->prep_estates);
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ EState *prep_estate = next_prep_estate(portal->prep_estates, &prep_lc);
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1279,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1288,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index c1a53e658cb..941e95010c3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -284,6 +284,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -297,6 +298,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->prep_estates = prep_estates;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 86226f8db70..3756a11345f 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -63,7 +63,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index d3a57242844..3a2169c9613 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -43,7 +43,7 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
@@ -63,7 +63,8 @@ extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options);
+ int instrument_options,
+ EState *prep_estate);
extern void FreeQueryDesc(QueryDesc *qdesc);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index d46ba59895d..e6fa122e6e4 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -20,6 +20,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -234,6 +235,31 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern EState *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ bool do_initial_pruning,
+ int eflags);
+
+/*
+ * Walk a prep_estates list in step with a parallel stmt_list iteration.
+ * Returns the next EState (or NULL) and advances *lc. Safe when
+ * prep_estates is NIL; just returns NULL for every call.
+ */
+static inline EState *
+next_prep_estate(List *prep_estates, ListCell **lc)
+{
+ EState *result = NULL;
+
+ if (*lc != NULL)
+ {
+ result = (EState *) lfirst(*lc);
+ *lc = lnext(prep_estates, *lc);
+ }
+ return result;
+}
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 63c067d5aae..84d80e3ab0d 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -775,7 +775,6 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
-
/*
* ExecRowMark -
* runtime representation of FOR [KEY] UPDATE/SHARE clauses
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..f69b4b9b479 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *prep_estates; /* list of EStates where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v7-0004-Use-pruning-aware-locking-in-cached-plans.patch (36.1K, 5-v7-0004-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From e0130ef11bfb97dba5afce22370cba5f3741ab0a Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:30:52 +0900
Subject: [PATCH v7 4/6] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry the EStates created by
ExecutorPrep() through the plan caching layer. The prep_estates
list is indexed one-to-one with CachedPlan->stmt_list and is
populated when GetCachedPlan() prepares a reused generic plan.
Adjust call sites in SPI, functions, portals, and EXPLAIN to
propagate this data.
Partition pruning expressions may call PL functions that require
an active snapshot (e.g., via EnsurePortalSnapshotExists()).
AcquireExecutorLocksUnpruned() establishes one before calling
ExecutorPrep() if needed, ensuring these expressions can execute
correctly during plan cache validation.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
Add a regression test that causes a generic plan to become invalid
while pruning-aware setup is running. The pruning expression calls a
function that can perform DDL on a partition, making the plan stale
during reuse.
The test's purpose is to drive execution through the invalidation
path that discards any ExecutorPrep state created before the plan was
found invalid, providing coverage for that cleanup logic.
---
src/backend/commands/prepare.c | 19 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/executor/spi.c | 26 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 20 ++
src/backend/tcop/postgres.c | 9 +-
src/backend/utils/cache/plancache.c | 255 +++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 29 +-
src/test/regress/expected/partition_prune.out | 50 +++-
src/test/regress/expected/plancache.out | 62 +++++
src/test/regress/sql/partition_prune.sql | 24 +-
src/test/regress/sql/plancache.sql | 51 ++++
15 files changed, 536 insertions(+), 29 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 005fbb48aa5..e8cd47131ce 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -154,6 +154,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -193,7 +194,10 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ /* Keep ExecutorPrep state with the portal and its resowner. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -205,7 +209,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_estates,
cplan);
/*
@@ -575,6 +579,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_estates;
ListCell *p;
@@ -633,8 +638,14 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ /* ExecutorPrep state is local to this EXPLAIN EXECUTE call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -653,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/* Explain each query */
prep_lc = list_head(prep_estates);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index c93e2664cfd..65dfae58dcf 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -698,6 +698,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 793c76d4f82..a7a4baaf8af 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4858,8 +4858,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -4873,6 +4873,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 994a69a1c8e..13703969dd8 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1579,6 +1579,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1659,7 +1660,11 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ /* ExecutorPrep state lives in this portal's context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1685,7 +1690,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_estates, /* lives in portalContext */
cplan);
/*
@@ -2078,6 +2083,7 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
SPICallbackArg spicallbackarg;
ErrorContextCallback spierrcontext;
@@ -2101,9 +2107,13 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
error_context_stack = &spierrcontext;
/* Get the generic plan for the query */
+ /* ExecutorPrep() state lives in caller's active context. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ &cprep);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2502,6 +2512,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
ListCell *lc2;
List *prep_estates;
ListCell *prep_lc;
+ CachedPlanPrepData cprep = {0};
spicallbackarg.query = plansource->query_string;
@@ -2576,11 +2587,16 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+
+ /* ExecutorPrep state is per _SPI_execute_plan call. */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..afa61d357c5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -657,6 +657,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1b5b9b5ed9c..ddb7902bc89 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,26 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of
+ * initially prunable relations. We use bms_next_member() to get
+ * the lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition
+ * expansion preserves RT index order. There is one ModifyTable
+ * per query level, so this captures exactly one entry per level.
+ * ExecInitModifyTable() asserts that the recorded index matches
+ * what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index cd1e429ceed..5c145a31274 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1636,6 +1636,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2017,7 +2018,11 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+
+ /* ExecutorPrep() state lives in portal context. */
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2030,7 +2035,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_estates,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 812e2265734..1d3244307da 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,14 +93,17 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksAll(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -942,6 +945,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The EStates created to do so are saved in cprep for
+ * later reuse by ExecutorStart().
+ *
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
@@ -949,7 +957,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -983,7 +991,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, true, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1005,7 +1016,13 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, false, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, false);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -1285,6 +1302,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function prepares
+ * each PlannedStmt via ExecutorPrep() and stores the EStates in
+ * cprep->prep_estates. These are intended to be passed later to
+ * ExecutorStart().
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1295,7 +1317,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1317,7 +1340,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (CheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1904,11 +1929,13 @@ QueryListGetPrimaryStmt(List *stmts)
}
/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ * AcquireExecutorLocksAll: acquire locks needed for execution of a cached
+ * plan; or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksAll(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1982,214 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the EState pointer for each PlannedStmt to cprep->prep_estates
+ *
+ * On release, it:
+ * - looks up the EState for each PlannedStmt from cprep->prep_estates
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ * - cleans up each EState
+ *
+ * prep_estates is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for EState happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_estates;
+ ListCell *prep_lc;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the EState list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_estates = cprep->prep_estates;
+ Assert(acquire || list_length(prep_estates) == list_length(stmt_list));
+
+ prep_lc = list_head(prep_estates);
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ EState *prep_estate;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_estates = lappend(cprep->prep_estates, NULL);
+ else
+ (void) next_prep_estate(prep_estates, &prep_lc);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ /*
+ * Pruning expressions may call PL functions that require an active
+ * snapshot (e.g., via EnsurePortalSnapshotExists()). Establish one
+ * if needed.
+ */
+ bool snap_pushed = false;
+
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snap_pushed = true;
+ }
+
+ prep_estate = ExecutorPrep(plannedstmt, cprep->params, cprep->owner, true,
+ cprep->eflags);
+ Assert(prep_estate);
+ cprep->prep_estates = lappend(cprep->prep_estates, prep_estate);
+
+ if (snap_pushed)
+ PopActiveSnapshot();
+ }
+ else
+ prep_estate = next_prep_estate(prep_estates, &prep_lc);
+
+ if (prep_estate)
+ {
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * We must always include the first result relation of each
+ * ModifyTable node in the plan, that is, the one mentioned in
+ * plannedstmt->firstResultRels in the set of relations to be
+ * locked to satisfy executor assumptions described
+ * in ExecInitModifyTable(). This can be wasteful, because we
+ * may not need to use the first result relation at all if other
+ * result relations are unpruned and thus sufficient for the
+ * ModifyTable node's needs. Unfortunately, we don't have per-node
+ * unpruned_relids set to determine that other result relations
+ * are included.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * CachedPlanPrepCleanup
+ * Clean up EState built for a generic plan.
+ *
+ * This is used in the corner case where CheckCachedPlan() discovers
+ * that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
+ * has already run. In that case we must both release the execution locks
+ * and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
+ * executor will never see or clean it up.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ ListCell *lc;
+ ResourceOwner oldowner;
+
+ if (cprep == NULL)
+ return;
+
+ /* Switch to owner that ExecutorPrep() would have used. */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = cprep->owner;
+ foreach(lc, cprep->prep_estates)
+ {
+ EState *prep_estate = (EState *) lfirst(lc);
+
+ if (prep_estate == NULL)
+ continue;
+
+ ExecCloseRangeTableRelations(prep_estate);
+ FreeExecutorState(prep_estate);
+ }
+ CurrentResourceOwner = oldowner;
+
+ list_free(cprep->prep_estates);
+ cprep->prep_estates = NIL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index c175ee95b68..989b3c73691 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 8c9321aab8c..1431f12a6e8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -123,6 +123,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 984c51515c6..da3ce9f3177 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,32 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_estates is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. If the
+ * plan is found invalid after locking, the same list is used to determine
+ * which relations to unlock before retrying.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ *
+ * eflags controls ExecutorPrep() behavior during initial pruning.
+ * Normally zero; set EXEC_FLAG_EXPLAIN_GENERIC to suppress pruning
+ * in EXPLAIN (GENERIC_PLAN). Need not match the eflags later passed
+ * to ExecutorStart().
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_estates; /* one EState per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate EState and its fields */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to control ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +266,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 39dab8fcc05..39770f3b6d6 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4860,9 +4860,7 @@ select c.relname
relname
--------------
prunelock_p1
- prunelock_p2
- prunelock_p3
-(3 rows)
+(1 row)
commit;
deallocate prunelock_q;
@@ -4904,6 +4902,50 @@ select c.relname
commit;
deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+execute prunelock_mt_q(4, 5);
+deallocate prunelock_mt_q;
drop table prunelock_p;
reset plan_cache_mode;
-reset enable_partition_pruning;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..1d69ab0a1c2 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,65 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 229c5eb370c..87672ad40f7 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1499,6 +1499,28 @@ select c.relname
commit;
deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
drop table prunelock_p;
reset plan_cache_mode;
-reset enable_partition_pruning;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..139b4688fd6 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,54 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v7-0006-Reuse-partition-pruning-results-in-parallel-worke.patch (8.2K, 6-v7-0006-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From 9c94b3751ae0c9decc337e33de2750a954a88d6f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 22:17:47 +0900
Subject: [PATCH v7 6/6] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce CheckInitialPruningResultsInWorker() (debug-builds only)
to verify that the results match what the worker would compute. This
check helps catch inconsistencies across leader and worker pruning
logic.
---
src/backend/executor/execParallel.c | 108 +++++++++++++++++++++++++++-
1 file changed, 107 insertions(+), 1 deletion(-)
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 024780d3516..d337bf8c081 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -67,6 +68,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -141,6 +144,8 @@ static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
/* Helper function that runs in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static void CheckInitialPruningResultsInWorker(EState *estate);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -620,12 +625,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -654,6 +665,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -680,6 +693,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -781,6 +804,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1280,10 +1313,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ EState *prep_estate = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1296,12 +1334,80 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep_estate = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, false, 0);
+ Assert(prep_estate);
+
+ prep_estate->es_part_prune_results = part_prune_results;
+ prep_estate->es_unpruned_relids =
+ bms_add_members(prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * A debug-build-only check that the pruning results passed from the
+ * leader match what the worker would independently compute.
+ */
+ CheckInitialPruningResultsInWorker(prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options,
- NULL);
+ prep_estate);
+}
+
+/*
+ * CheckInitialPruningResultsInWorker
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ *
+ * Only performed in debug builds.
+ */
+static void
+CheckInitialPruningResultsInWorker(EState *estate)
+{
+#ifdef USE_ASSERT_CHECKING
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i++);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (!bms_equal(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplans in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+#endif
}
/*
--
2.47.3
[application/octet-stream] v7-0001-Refactor-partition-pruning-initialization-for-cla.patch (10.2K, 7-v7-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From 6f2c9cc7a30d38cb2606595f62b62c77e2aba6e9 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 15:08:52 +0900
Subject: [PATCH v7 1/6] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called at a time when the PARAM_EXEC
parameters have not yet been set up.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execPartition.c | 103 +++++++++++++++++++--------
src/include/executor/execPartition.h | 1 +
3 files changed, 74 insertions(+), 31 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index bfd3ebc601e..654f9246ad0 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -868,6 +868,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
*/
+ ExecCreatePartitionPruneStates(estate);
ExecDoInitialPruning(estate);
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index bab294f5e91..20c3513fabe 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -184,8 +184,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1942,6 +1941,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1966,6 +1968,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1973,11 +1998,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1995,20 +2020,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -2016,8 +2034,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2135,14 +2151,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2376,8 +2390,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2389,10 +2403,29 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
+ }
+#endif
+ }
j++;
}
@@ -2489,9 +2522,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2519,9 +2553,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 82063ec2a16..4c96808c376 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-03-19 17:20 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2026-03-19 17:20 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Mon, Mar 9, 2026 at 1:41 PM Amit Langote <[email protected]> wrote:
> On Sat, Mar 7, 2026 at 6:54 PM Amit Langote <[email protected]> wrote:
> > Attached is v6 of the patch series. I've been working toward
> > committing this, so I wanted to lay out the ExecutorPrep() design and
> > the key trade-offs before doing so.
> >
> > When a cached generic plan references a partitioned table,
> > GetCachedPlan() locks all partitions upfront via
> > AcquireExecutorLocks(), even those that initial pruning will
> > eliminate. But initial partition pruning only runs later during
> > ExecutorStart(). Moving pruning earlier requires some executor setup
> > (range table, permissions, pruning state), and ExecutorPrep() is the
> > vehicle for that. Unlike the approach reverted in last May, this
> > keeps the CachedPlan itself unchanged -- all per-execution state flows
> > through a separate CachedPlanPrepData that the caller provides.
> >
> > The approach also keeps GetCachedPlan()'s interface
> > backward-compatible: the new CachedPlanPrepData argument is optional.
> > If a caller passes NULL, all partitions are locked as before and
> > nothing changes. This means existing callers and any new code that
> > calls GetCachedPlan() without caring about pruning-aware locking just
> > works.
> >
> > The risk is on the other side: if a caller does pass a
> > CachedPlanPrepData, GetCachedPlan() will lock only the surviving
> > partitions and populate prep_estates with the EStates that
> > ExecutorPrep() created. The caller then must make those EStates
> > available to ExecutorStart() -- via QueryDesc->estate,
> > portal->prep_estates, or the equivalent path for SPI and SQL
> > functions. If it fails to do so, ExecutorStart() will call
> > ExecutorPrep() again, which may compute different pruning results than
> > the original call, potentially expecting locks on relations that were
> > never acquired. The executor would then operate on relations it
> > doesn't hold locks on.
> >
> > So the contract is: if you opt in to pruning-aware locking by passing
> > CachedPlanPrepData, you must complete the pipeline by delivering the
> > prep EStates to the executor. In the current patch, all the call sites
> > that pass a CachedPlanPrepData (portals, SPI, EXECUTE, SQL functions,
> > EXPLAIN) do thread the EStates through correctly, and I've tried to
> > make the plumbing straightforward enough that it's hard to get wrong.
> > But it is a new invariant that didn't exist before, and a caller that
> > gets it wrong would fail silently rather than with an obvious error.
> >
> > To catch such violations, I've added a debug-only check in
> > standard_ExecutorStart() that fires when no prep EState was provided.
> > It iterates over the plan's rtable and verifies that every lockable
> > relation is actually locked. It should always be true if
> > AcquireExecutorLocks() locked everything, but would fail if
> > pruning-aware locking happened upstream and the caller dropped the
> > prep EState. The check is skipped in parallel workers, which acquire
> > relation locks lazily in ExecGetRangeTableRelation().
> >
> > + if (queryDesc->estate == NULL)
> > + {
> > +#ifdef USE_ASSERT_CHECKING
> > + if (!IsParallelWorker())
> > + {
> > + ListCell *lc;
> > +
> > + foreach(lc, queryDesc->plannedstmt->rtable)
> > + {
> > + RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
> > +
> > + if (rte->rtekind == RTE_RELATION ||
> > + (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
> > + Assert(CheckRelationOidLockedByMe(rte->relid,
> > + rte->rellockmode,
> > + true));
> > + }
> > + }
> > +#endif
> > + queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
> > + queryDesc->params,
> > + CurrentResourceOwner,
> > + true,
> > + eflags);
> > + }
> > +#ifdef USE_ASSERT_CHECKING
> > + else
> > + {
> > + /*
> > + * A prep EState was provided, meaning pruning-aware locking
> > + * should have locked at least the unpruned relations.
> > + */
> > + if (!IsParallelWorker())
> > + {
> > + int rtindex = -1;
> > +
> > + while ((rtindex =
> > bms_next_member(queryDesc->estate->es_unpruned_relids,
> > + rtindex)) >= 0)
> > + {
> > + RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
> > +
> > + Assert(rte->rtekind == RTE_RELATION ||
> > + (rte->rtekind == RTE_SUBQUERY &&
> > + rte->relid != InvalidOid));
> > + Assert(CheckRelationOidLockedByMe(rte->relid,
> > + rte->rellockmode, true));
> > + }
> > + }
> > + }
> > +#endif
> >
> > So the invariant is: if no prep EState was provided, every relation in
> > the plan is locked; if one was provided, at least the unpruned
> > relations are locked. Both are checked in assert builds.
> >
> > I think this covers the main concerns, but I may be missing something.
> > If anyone sees a problem with this approach, I'd like to hear about
> > it.
>
> Here's v7. Some plancache.c changes that I'd made were in the wrong
> patch in v6; this version puts them where they belong.
Attached is an updated set. One more fix: I added an Assert in
SPI_cursor_open_internal()'s !plan->saved path to verify that
prep_estates is NIL. Unsaved plans always take the custom plan path,
so pruning-aware locking never applies, but it's worth guarding
explicitly since the copyObject/ReleaseCachedPlan sequence that
follows would not be safe otherwise. Also changed
SPI_plan_get_cached_plan() to pass NULL for cprep, since it only
returns the CachedPlan pointer and has no way to deliver prep_estates
to anyone.
Stepping back -- the core question is whether running executor logic
(pruning) inside GetCachedPlan() is acceptable at all. The plan cache
and executor have always had a clean boundary: plan cache locks
everything, executor runs. This optimization necessarily crosses that
line, because the information needed to decide which locks to skip
(pruning results) can only come from executor machinery.
The proposed approach has GetCachedPlan() call ExecutorPrep() to do a
limited subset of executor work (range table init, permissions,
pruning), carry the results out through CachedPlanPrepData, and leave
the CachedPlan itself untouched. The executor already has a multi-step
protocol: start/run/end. prep/start/run/end is just a finer
decomposition of what InitPlan() was already doing inside
ExecutorStart().
Of the attached patches, I'm targeting 0001-0003 for commit. 0004 (SQL
function support) and 0005 (parallel worker reuse) are useful
follow-ons but not essential. The optimization works without them for
most cases, and they can be reviewed and committed separately.
If there's a cleaner way to avoid locking pruned partitions without
the plumbing this patch adds, I haven't found it in the year since the
revert. I'd welcome a pointer if you see one. Failing that, I think
this is the right trade-off, but it's a judgment call about where to
hold your nose.
Tom, I'd value your opinion on whether this approach is something
you'd be comfortable seeing in the tree.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v8-0005-Reuse-partition-pruning-results-in-parallel-worke.patch (11.0K, 2-v8-0005-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From 4c12c380b75b8684e9c41c80d0c77027cf592e17 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 19 Mar 2026 20:03:58 +0900
Subject: [PATCH v8 5/5] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Introduce CheckInitialPruningResultsInWorker() (debug-builds only)
to verify that the results match what the worker would compute. This
check helps catch inconsistencies across leader and worker pruning
logic.
---
src/backend/executor/execMain.c | 10 +--
src/backend/executor/execParallel.c | 108 +++++++++++++++++++++++++++-
src/backend/utils/cache/plancache.c | 2 +-
src/include/executor/executor.h | 3 +-
4 files changed, 116 insertions(+), 7 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 0f95ad88497..9a3700e672f 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -207,7 +207,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
queryDesc->params,
CurrentResourceOwner,
- eflags);
+ eflags, true);
}
#ifdef USE_ASSERT_CHECKING
else
@@ -330,7 +330,8 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
* ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
*
* Performs range table initialization, permission checks, and initial
- * partition pruning if partPruneInfos are present.
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true; false in a parallel worker.
*
* Returns an EState that the caller must either pass to ExecutorStart()
* for reuse or free via FreeExecutorState() if execution will not proceed.
@@ -340,7 +341,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
EState *
ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
- int eflags)
+ int eflags, bool do_initial_pruning)
{
ResourceOwner oldowner;
EState *estate;
@@ -386,7 +387,8 @@ ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
* to es_part_prune_infos.
*/
ExecCreatePartitionPruneStates(estate);
- ExecDoInitialPruning(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
CurrentResourceOwner = oldowner;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 024780d3516..2de4b35a16e 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -67,6 +68,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -141,6 +144,8 @@ static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
/* Helper function that runs in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static void CheckInitialPruningResultsInWorker(EState *estate);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -620,12 +625,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -654,6 +665,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -680,6 +693,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -781,6 +804,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1280,10 +1313,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ EState *prep_estate = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1296,12 +1334,80 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep_estate = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, 0, false);
+ Assert(prep_estate);
+
+ prep_estate->es_part_prune_results = part_prune_results;
+ prep_estate->es_unpruned_relids =
+ bms_add_members(prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * A debug-build-only check that the pruning results passed from the
+ * leader match what the worker would independently compute.
+ */
+ CheckInitialPruningResultsInWorker(prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options,
- NULL);
+ prep_estate);
+}
+
+/*
+ * CheckInitialPruningResultsInWorker
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ *
+ * Only performed in debug builds.
+ */
+static void
+CheckInitialPruningResultsInWorker(EState *estate)
+{
+#ifdef USE_ASSERT_CHECKING
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i++);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (!bms_equal(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplans in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+#endif
}
/*
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 2d4c57d3deb..0dd4f40c964 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -2102,7 +2102,7 @@ AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
}
prep_estate = ExecutorPrep(plannedstmt, cprep->params,
- cprep->owner, cprep->eflags);
+ cprep->owner, cprep->eflags, true);
Assert(prep_estate);
cprep->prep_estates = lappend(cprep->prep_estates, prep_estate);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 24604120c27..38848ba0651 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -240,7 +240,8 @@ extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern EState *ExecutorPrep(PlannedStmt *pstmt,
ParamListInfo params,
ResourceOwner owner,
- int eflags);
+ int eflags,
+ bool do_initial_pruning);
/*
* Walk a prep_estates list in step with a parallel stmt_list iteration.
--
2.47.3
[application/octet-stream] v8-0003-Use-pruning-aware-locking-in-cached-plans.patch (41.1K, 3-v8-0003-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From 2e637cbc71a14775e161bde21e1036eca2644a2b Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 19 Mar 2026 19:02:04 +0900
Subject: [PATCH v8 3/5] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry the EStates created by
ExecutorPrep() through the plan caching layer. The prep_estates
list is indexed one-to-one with CachedPlan->stmt_list and is
populated when GetCachedPlan() prepares a reused generic plan.
Adjust call sites in SPI, functions, portals, and EXPLAIN to
propagate this data.
Partition pruning expressions may call PL functions that require
an active snapshot (e.g., via EnsurePortalSnapshotExists()).
AcquireExecutorLocksUnpruned() establishes one before calling
ExecutorPrep() if needed, ensuring these expressions can execute
correctly during plan cache validation.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
Regression tests are included to verify:
- Only surviving partitions are locked when pruning is enabled, and
all partitions are locked when it is disabled (pg_locks inspection).
- Multiple ModifyTable nodes (via writable CTEs) handle the case where
all target partitions are pruned, exercising firstResultRels.
- Plan invalidation during pruning-aware lock setup (DDL triggered by
a pruning expression) discards the prep state and replans cleanly.
Note for extension authors: code that accesses partition relations
through EState must check that the RT index is a member of
es_unpruned_relids before opening the relation. Previously this was
an optimization (avoid processing pruned partitions); it is now a
correctness requirement, because pruned partitions may not be locked.
ExecGetRangeTableRelation() already enforces this with an error when
called on a pruned relation.
---
src/backend/commands/prepare.c | 17 +-
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/executor/spi.c | 22 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 20 ++
src/backend/tcop/postgres.c | 7 +-
src/backend/utils/cache/plancache.c | 257 +++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 35 ++-
src/test/regress/expected/partition_prune.out | 145 ++++++++++
src/test/regress/expected/plancache.out | 62 +++++
src/test/regress/sql/partition_prune.sql | 77 ++++++
src/test/regress/sql/plancache.sql | 51 ++++
15 files changed, 689 insertions(+), 24 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index c7bab14b633..fec83cc6fd4 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -156,6 +156,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -195,7 +196,9 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -207,7 +210,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_estates,
cplan);
/*
@@ -577,6 +580,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_estates;
ListCell *p;
@@ -635,8 +639,13 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -655,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/* Explain each query */
prep_lc = list_head(prep_estates);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 952a784c924..c0ca72b38dd 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -699,6 +699,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..9230f2b554f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4865,8 +4865,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -4880,6 +4880,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 380bbc44e97..f1d84f7a350 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1580,6 +1580,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1660,7 +1661,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1670,7 +1674,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
* so must copy the plan into the portal's context. An error here
* will result in leaking our refcount on the plan, but it doesn't
* matter because the plan is unsaved and hence transient anyway.
+ *
+ * Unsaved plans use custom plans, so prep should be a no-op.
*/
+ Assert(cprep.prep_estates == NIL);
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
MemoryContextSwitchTo(oldcontext);
@@ -1686,7 +1693,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_estates,
cplan);
/*
@@ -2104,7 +2111,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2503,6 +2511,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
ListCell *lc2;
List *prep_estates;
ListCell *prep_lc;
+ CachedPlanPrepData cprep = {0};
spicallbackarg.query = plansource->query_string;
@@ -2577,11 +2586,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..afa61d357c5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -657,6 +657,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1b5b9b5ed9c..ddb7902bc89 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,26 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of
+ * initially prunable relations. We use bms_next_member() to get
+ * the lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition
+ * expansion preserves RT index order. There is one ModifyTable
+ * per query level, so this captures exactly one entry per level.
+ * ExecInitModifyTable() asserts that the recorded index matches
+ * what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 355a490cde9..de362ff1672 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1637,6 +1637,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2018,7 +2019,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2031,7 +2034,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_estates,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 182c16e9b9a..2d4c57d3deb 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,14 +93,17 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksAll(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -942,6 +945,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The EStates created to do so are saved in cprep for
+ * later reuse by ExecutorStart().
+ *
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
@@ -949,7 +957,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -983,7 +991,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, true, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1005,7 +1016,13 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, false, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, false);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -1285,6 +1302,15 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function
+ * performs initial pruning via ExecutorPrep() and locks only the
+ * surviving partitions. The resulting EStates are stored in
+ * cprep->prep_estates and must be delivered to ExecutorStart() via
+ * QueryDesc->estate (or the equivalent portal/SPI path). Failure
+ * to do so means the executor will operate on relations for which
+ * locks were never acquired. Passing NULL for cprep is always safe;
+ * all partitions are locked as before.
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1295,7 +1321,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1317,7 +1344,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (CheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1904,11 +1933,13 @@ QueryListGetPrimaryStmt(List *stmts)
}
/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ * AcquireExecutorLocksAll: acquire locks needed for execution of a cached
+ * plan; or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksAll(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1986,212 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the EState pointer for each PlannedStmt to cprep->prep_estates
+ *
+ * On release, it:
+ * - looks up the EState for each PlannedStmt from cprep->prep_estates
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ *
+ * prep_estates is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for EState happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_estates;
+ ListCell *prep_lc;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the EState list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_estates = cprep->prep_estates;
+ Assert(acquire || list_length(prep_estates) == list_length(stmt_list));
+
+ prep_lc = list_head(prep_estates);
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ EState *prep_estate;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_estates = lappend(cprep->prep_estates, NULL);
+ else
+ (void) next_prep_estate(prep_estates, &prep_lc);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ /*
+ * Pruning expressions may call PL functions that require an active
+ * snapshot (e.g., via EnsurePortalSnapshotExists()). Establish one
+ * if needed.
+ */
+ bool snap_pushed = false;
+
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snap_pushed = true;
+ }
+
+ prep_estate = ExecutorPrep(plannedstmt, cprep->params,
+ cprep->owner, cprep->eflags);
+ Assert(prep_estate);
+ cprep->prep_estates = lappend(cprep->prep_estates, prep_estate);
+
+ if (snap_pushed)
+ PopActiveSnapshot();
+ }
+ else
+ prep_estate = next_prep_estate(prep_estates, &prep_lc);
+
+ if (prep_estate)
+ {
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * We must always include the first result relation of each
+ * ModifyTable node in the plan, that is, the one mentioned in
+ * plannedstmt->firstResultRels in the set of relations to be
+ * locked to satisfy executor assumptions described
+ * in ExecInitModifyTable(). This can be wasteful, because we
+ * may not need to use the first result relation at all if other
+ * result relations are unpruned and thus sufficient for the
+ * ModifyTable node's needs. Unfortunately, we don't have per-node
+ * unpruned_relids set to determine that other result relations
+ * are included.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * CachedPlanPrepCleanup
+ * Clean up EState built for a generic plan.
+ *
+ * This is used in the corner case where CheckCachedPlan() discovers
+ * that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
+ * has already run. In that case we must both release the execution locks
+ * and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
+ * executor will never see or clean it up.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ ListCell *lc;
+ ResourceOwner oldowner;
+
+ if (cprep == NULL)
+ return;
+
+ /* Switch to owner that ExecutorPrep() would have used. */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = cprep->owner;
+ foreach(lc, cprep->prep_estates)
+ {
+ EState *prep_estate = (EState *) lfirst(lc);
+
+ if (prep_estate == NULL)
+ continue;
+
+ ExecCloseRangeTableRelations(prep_estate);
+ FreeExecutorState(prep_estate);
+ }
+ CurrentResourceOwner = oldowner;
+
+ list_free(cprep->prep_estates);
+ cprep->prep_estates = NIL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 27758ec16fe..4fd9d9bcc56 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..55279cbbda8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -121,6 +121,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 984c51515c6..c22f832d0b1 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,38 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_estates is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. If the
+ * plan is found invalid after locking, the same list is used to determine
+ * which relations to unlock before retrying.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ *
+ * eflags controls ExecutorPrep() behavior during initial pruning.
+ * Normally zero; set EXEC_FLAG_EXPLAIN_GENERIC to suppress pruning
+ * in EXPLAIN (GENERIC_PLAN). Need not match the eflags later passed
+ * to ExecutorStart().
+ *
+ * prep_estates must reach ExecutorStart() to be adopted for execution.
+ * If the plan is invalidated before that happens, CachedPlanPrepCleanup()
+ * frees them instead. The EStates are allocated in 'context' and their
+ * resources tracked under 'owner', which the caller sets to match the
+ * execution environment (e.g., portal context and resowner).
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_estates; /* one EState per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate EState and its fields */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to control ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +272,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..8e0cc98baca 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4824,3 +4824,148 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+(1 row)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_2
+ Update on prunelock_p1 prunelock_p_3
+ Update on prunelock_p2 prunelock_p_4
+ Update on prunelock_p3 prunelock_p_5
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_3
+ -> Seq Scan on prunelock_p2 prunelock_p_4
+ -> Seq Scan on prunelock_p3 prunelock_p_5
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_6
+ -> Append
+ Subplans Removed: 3
+ -> Append
+ Subplans Removed: 3
+(16 rows)
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+reset plan_cache_mode;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..1d69ab0a1c2 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,65 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..804dd3c8f4e 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1447,3 +1447,80 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..139b4688fd6 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,54 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v8-0004-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (7.8K, 4-v8-0004-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From 2ab5fefb9644118a1f1528a53b9a6af90e063edb Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:09:23 +0900
Subject: [PATCH v8 4/5] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 29 +++++++++++++--
src/test/regress/expected/plancache.out | 48 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 34 ++++++++++++++++++
3 files changed, 109 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index c0ca72b38dd..f246f051c25 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -73,6 +73,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ EState *prep_estate; /* EState created in ExecutorPrep() for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -658,6 +659,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ ListCell *prep_lc;
/*
* Clean up after previous query, if there was one.
@@ -696,11 +699,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -721,9 +733,11 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ prep_lc = list_head(cprep.prep_estates);
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ EState *prep_estate = next_prep_estate(cprep.prep_estates, &prep_lc);
execution_state *newes;
/*
@@ -765,6 +779,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep_estate = prep_estate;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1363,6 +1378,15 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ if (es->prep_estate)
+ MemoryContextSetParent(es->prep_estate->es_query_cxt,
+ fcache->subcontext);
+
es->qd = CreateQueryDesc(es->stmt,
fcache->func->src,
GetActiveSnapshot(),
@@ -1371,7 +1395,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
0,
- NULL);
+ es->prep_estate);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
@@ -1462,6 +1486,7 @@ postquel_end(execution_state *es, SQLFunctionCachePtr fcache)
FreeQueryDesc(es->qd);
es->qd = NULL;
+ es->prep_estate = NULL;
MemoryContextSwitchTo(oldcontext);
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 1d69ab0a1c2..371673a6e96 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -459,4 +459,52 @@ NOTICE: creating index on partition inval_during_pruning_p1
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select * from sqlf_base order by 1;
+ id | val
+----+-----
+ 1 | 30
+(1 row)
+
+select * from sqlf_log order by 1;
+ id | note
+----+----------------
+ 1 | logged by rule
+ 1 | logged by rule
+(2 rows)
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 139b4688fd6..b89c9ad69a4 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -273,4 +273,38 @@ drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+select * from sqlf_base order by 1;
+select * from sqlf_log order by 1;
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v8-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (27.1K, 5-v8-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From a2a0befc44d25df8b549644a7e179923270a0fc6 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 11 Nov 2025 21:47:46 +0900
Subject: [PATCH v8 2/5] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorPrep() builds an EState containing the executor
metadata needed before plan execution, including partition
pruning state where partPruneInfos are present, and returns it
directly to the caller.
ExecutorStart() now checks if QueryDesc->estate is already set
(indicating ExecutorPrep() was called earlier). If so, it reuses
the EState to avoid redoing range table setup and pruning.
Otherwise, it invokes ExecutorPrep() itself and adopts the
resulting EState for the duration of the query. This keeps the
executor startup behavior unchanged while making the setup work
callable separately when needed.
CreateQueryDesc() grows a prep_estate argument to accept an
optionally pre-created EState and stores it in the QueryDesc.
Portals, SPI, SQL functions, and EXPLAIN are wired to carry
optional EState pointers alongside the PlannedStmt list, but most
callers still pass NULL and let ExecutorStart() perform the setup
lazily.
ExecutorPrep() requires the caller to have established an active
snapshot, as partition pruning expressions may call PL functions
that internally require one (e.g., via EnsurePortalSnapshotExists()).
Update executor/README and related comments to document the new
control flow and the separation between preparation and execution.
Note that as of this commit, ExecutorStart() is the only caller of
ExecutorPrep(), so there is no semantic change in behavior. Later
commits will add specialized callers that invoke ExecutorPrep()
earlier to enable pruning-aware locking in cached plans.
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 8 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/executor/README | 11 +-
src/backend/executor/execMain.c | 164 +++++++++++++++++++++++-----
src/backend/executor/execParallel.c | 3 +-
src/backend/executor/functions.c | 3 +-
src/backend/executor/spi.c | 9 +-
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 24 +++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 5 +-
src/include/executor/executor.h | 26 +++++
src/include/nodes/execnodes.h | 1 -
src/include/utils/portal.h | 2 +
20 files changed, 229 insertions(+), 52 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 499ce9ad3db..e09303491d2 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -877,7 +877,7 @@ BeginCopyTo(ParseState *pstate,
cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/*
* Call ExecutorStart to prepare the plan for execution.
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 270e9bf3110..b4a9808955a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -336,7 +336,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 296ea8a1ed2..02027c429e1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -372,7 +372,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -494,7 +494,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -552,7 +553,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* Create a QueryDesc for the query */
queryDesc = CreateQueryDesc(plannedstmt, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ dest, params, queryEnv, instrument_option,
+ prep_estate);
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b98801d08f2..939e7a632f0 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -1174,7 +1174,7 @@ execute_sql_string(const char *sql, const char *filename)
qdesc = CreateQueryDesc(stmt,
sql,
GetActiveSnapshot(), NULL,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..2cdfdcf984b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -439,7 +439,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..1e880a6d7c9 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NIL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..c7bab14b633 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -207,6 +207,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -577,7 +578,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_estates;
ListCell *p;
+ ListCell *prep_lc;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -652,14 +655,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_estates = NIL;
/* Explain each query */
+ prep_lc = list_head(prep_estates);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep_estate,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..d749ceb6687 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,18 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Creates EState,
+ performs range table initialization, permission checks, and initial
+ partition pruning. Returns the EState that ExecutorStart() should
+ reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if not already done, indicated by NULL QueryDesc.estate)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c58a2abe9a7..0f95ad88497 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -57,6 +57,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -147,7 +148,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -173,9 +173,70 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When
+ * no prep EState was provided, AcquireExecutorLocks() should have
+ * locked every relation in the plan. When one was provided,
+ * pruning-aware locking should have locked at least the unpruned
+ * relations. Both checks are skipped in parallel workers, which
+ * acquire relation locks lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking
+ * should have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -265,6 +326,73 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present.
+ *
+ * Returns an EState that the caller must either pass to ExecutorStart()
+ * for reuse or free via FreeExecutorState() if execution will not proceed.
+ * GetCachedPlan() uses this to determine which partitions to lock after
+ * pruning; if the resulting EState is not delivered to ExecutorStart(),
+ * the executor would operate on unlocked relations.
+ */
+EState *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Caller must have established an active snapshot. */
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures needed for both initial and
+ * runtime partition pruning. These structures are built from the
+ * PartitionPruneInfo entries in the plan tree.
+ *
+ * Also perform initial pruning to compute the subset of child subplans
+ * that will be executed. The results, which are bitmapsets of selected
+ * child indexes, are saved in es_part_prune_results. This list is parallel
+ * to es_part_prune_infos.
+ */
+ ExecCreatePartitionPruneStates(estate);
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ return estate;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -840,38 +968,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecCreatePartitionPruneStates(estate);
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..024780d3516 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1300,7 +1300,8 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
- receiver, paramLI, NULL, instrument_options);
+ receiver, paramLI, NULL, instrument_options,
+ NULL);
}
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..952a784c924 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1369,7 +1369,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest,
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
- 0);
+ 0,
+ NULL);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..380bbc44e97 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1686,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2500,6 +2501,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_estates;
+ ListCell *prep_lc;
spicallbackarg.query = plansource->query_string;
@@ -2578,6 +2581,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_estates = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2615,9 +2619,11 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ prep_lc = list_head(prep_estates);
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2695,7 +2701,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
dest,
options->params,
_SPI_current->queryEnv,
- 0);
+ 0,
+ prep_estate);
res = _SPI_pquery(qdesc, fire_triggers,
canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index b3563113219..355a490cde9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1231,6 +1231,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2030,6 +2031,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index d8fc75d0bb9..b18266487bb 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -72,7 +73,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options)
+ int instrument_options,
+ EState *prep_estate)
{
QueryDesc *qd = palloc_object(QueryDesc);
@@ -93,6 +95,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->planstate = NULL;
qd->totaltime = NULL;
+ /* Use the EState created by ExecutorPrep() if already done. */
+ qd->estate = prep_estate;
+
/* not yet executed */
qd->already_executed = false;
@@ -123,6 +128,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep_estate: EState created in ExecutorPrep() for the query, if any
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +141,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -148,7 +155,8 @@ ProcessQuery(PlannedStmt *plan,
*/
queryDesc = CreateQueryDesc(plan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0,
+ prep_estate);
/*
* Call ExecutorStart to prepare the plan for execution
@@ -495,7 +503,10 @@ PortalStart(Portal portal, ParamListInfo params,
None_Receiver,
params,
portal->queryEnv,
- 0);
+ 0,
+ portal->prep_estates ?
+ (EState *) linitial(portal->prep_estates) :
+ NULL);
/*
* If it's a scrollable cursor, executor needs to support
@@ -1185,6 +1196,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ ListCell *prep_lc;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1217,11 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ prep_lc = list_head(portal->prep_estates);
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ EState *prep_estate = next_prep_estate(portal->prep_estates, &prep_lc);
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1279,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1288,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 493f9b0ee19..443b583637c 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -286,6 +286,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->prep_estates = prep_estates;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 472e141bba3..71ebe38bc86 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -64,7 +64,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index d3a57242844..3a2169c9613 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -43,7 +43,7 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
@@ -63,7 +63,8 @@ extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options);
+ int instrument_options,
+ EState *prep_estate);
extern void FreeQueryDesc(QueryDesc *qdesc);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 064df01811e..24604120c27 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -21,6 +21,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -235,6 +236,31 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern EState *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ int eflags);
+
+/*
+ * Walk a prep_estates list in step with a parallel stmt_list iteration.
+ * Returns the next EState (or NULL) and advances *lc.
+ *
+ * Safe when prep_estates is NIL; just returns NULL for every call.
+ */
+static inline EState *
+next_prep_estate(List *prep_estates, ListCell **lc)
+{
+ EState *result = NULL;
+
+ if (*lc != NULL)
+ {
+ result = (EState *) lfirst(*lc);
+ *lc = lnext(prep_estates, *lc);
+ }
+ return result;
+}
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 0716c5a9aed..42d75693d43 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -784,7 +784,6 @@ typedef struct EState
List *es_insert_pending_modifytables;
} EState;
-
/*
* ExecRowMark -
* runtime representation of FOR [KEY] UPDATE/SHARE clauses
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..f69b4b9b479 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *prep_estates; /* list of EStates where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v8-0001-Refactor-partition-pruning-initialization-for-cla.patch (10.2K, 6-v8-0001-Refactor-partition-pruning-initialization-for-cla.patch)
download | inline diff:
From a79af61882f1ff696d46f612a5b3a8ce50ee75d6 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 15:08:52 +0900
Subject: [PATCH v8 1/5] Refactor partition pruning initialization for clarity
and modularity
Move the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. This separates the setup of pruning state from the execution
of initial pruning logic, making the code clearer and easier to
maintain.
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called at a time when the PARAM_EXEC
parameters have not yet been set up.
This refactoring allows callers to reuse the pruning setup logic
without always triggering pruning, a capability useful for future use
cases that may only need metadata initialization.
---
src/backend/executor/execMain.c | 1 +
src/backend/executor/execPartition.c | 103 +++++++++++++++++++--------
src/include/executor/execPartition.h | 1 +
3 files changed, 74 insertions(+), 31 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..c58a2abe9a7 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -870,6 +870,7 @@ InitPlan(QueryDesc *queryDesc, int eflags)
* to each PartitionPruneInfo entry, and the es_part_prune_results list is
* parallel to es_part_prune_infos.
*/
+ ExecCreatePartitionPruneStates(estate);
ExecDoInitialPruning(estate);
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..feea9fdfde0 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -185,8 +185,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1943,6 +1942,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1967,6 +1969,29 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*-------------------------------------------------------------------------
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
/*
* ExecDoInitialPruning
@@ -1974,11 +1999,11 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1996,20 +2021,13 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
- foreach(lc, estate->es_part_prune_infos)
+ Assert(estate->es_part_prune_results == NULL);
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment.
@@ -2017,8 +2035,6 @@ ExecDoInitialPruning(EState *estate)
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2136,14 +2152,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2377,8 +2391,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2390,10 +2404,29 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
}
}
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
+ }
+#endif
+ }
j++;
}
@@ -2490,9 +2523,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2520,9 +2554,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 82063ec2a16..4c96808c376 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
--
2.47.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-03-25 07:39 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2026-03-25 07:39 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Fri, Mar 20, 2026 at 2:20 AM Amit Langote <[email protected]> wrote:
> On Mon, Mar 9, 2026 at 1:41 PM Amit Langote <[email protected]> wrote:
> > On Sat, Mar 7, 2026 at 6:54 PM Amit Langote <[email protected]> wrote:
> > > Attached is v6 of the patch series. I've been working toward
> > > committing this, so I wanted to lay out the ExecutorPrep() design and
> > > the key trade-offs before doing so.
> > >
> > > When a cached generic plan references a partitioned table,
> > > GetCachedPlan() locks all partitions upfront via
> > > AcquireExecutorLocks(), even those that initial pruning will
> > > eliminate. But initial partition pruning only runs later during
> > > ExecutorStart(). Moving pruning earlier requires some executor setup
> > > (range table, permissions, pruning state), and ExecutorPrep() is the
> > > vehicle for that. Unlike the approach reverted in last May, this
> > > keeps the CachedPlan itself unchanged -- all per-execution state flows
> > > through a separate CachedPlanPrepData that the caller provides.
> > >
> > > The approach also keeps GetCachedPlan()'s interface
> > > backward-compatible: the new CachedPlanPrepData argument is optional.
> > > If a caller passes NULL, all partitions are locked as before and
> > > nothing changes. This means existing callers and any new code that
> > > calls GetCachedPlan() without caring about pruning-aware locking just
> > > works.
> > >
> > > The risk is on the other side: if a caller does pass a
> > > CachedPlanPrepData, GetCachedPlan() will lock only the surviving
> > > partitions and populate prep_estates with the EStates that
> > > ExecutorPrep() created. The caller then must make those EStates
> > > available to ExecutorStart() -- via QueryDesc->estate,
> > > portal->prep_estates, or the equivalent path for SPI and SQL
> > > functions. If it fails to do so, ExecutorStart() will call
> > > ExecutorPrep() again, which may compute different pruning results than
> > > the original call, potentially expecting locks on relations that were
> > > never acquired. The executor would then operate on relations it
> > > doesn't hold locks on.
> > >
> > > So the contract is: if you opt in to pruning-aware locking by passing
> > > CachedPlanPrepData, you must complete the pipeline by delivering the
> > > prep EStates to the executor. In the current patch, all the call sites
> > > that pass a CachedPlanPrepData (portals, SPI, EXECUTE, SQL functions,
> > > EXPLAIN) do thread the EStates through correctly, and I've tried to
> > > make the plumbing straightforward enough that it's hard to get wrong.
> > > But it is a new invariant that didn't exist before, and a caller that
> > > gets it wrong would fail silently rather than with an obvious error.
> > >
> > > To catch such violations, I've added a debug-only check in
> > > standard_ExecutorStart() that fires when no prep EState was provided.
> > > It iterates over the plan's rtable and verifies that every lockable
> > > relation is actually locked. It should always be true if
> > > AcquireExecutorLocks() locked everything, but would fail if
> > > pruning-aware locking happened upstream and the caller dropped the
> > > prep EState. The check is skipped in parallel workers, which acquire
> > > relation locks lazily in ExecGetRangeTableRelation().
> > >
> > > + if (queryDesc->estate == NULL)
> > > + {
> > > +#ifdef USE_ASSERT_CHECKING
> > > + if (!IsParallelWorker())
> > > + {
> > > + ListCell *lc;
> > > +
> > > + foreach(lc, queryDesc->plannedstmt->rtable)
> > > + {
> > > + RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
> > > +
> > > + if (rte->rtekind == RTE_RELATION ||
> > > + (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
> > > + Assert(CheckRelationOidLockedByMe(rte->relid,
> > > + rte->rellockmode,
> > > + true));
> > > + }
> > > + }
> > > +#endif
> > > + queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
> > > + queryDesc->params,
> > > + CurrentResourceOwner,
> > > + true,
> > > + eflags);
> > > + }
> > > +#ifdef USE_ASSERT_CHECKING
> > > + else
> > > + {
> > > + /*
> > > + * A prep EState was provided, meaning pruning-aware locking
> > > + * should have locked at least the unpruned relations.
> > > + */
> > > + if (!IsParallelWorker())
> > > + {
> > > + int rtindex = -1;
> > > +
> > > + while ((rtindex =
> > > bms_next_member(queryDesc->estate->es_unpruned_relids,
> > > + rtindex)) >= 0)
> > > + {
> > > + RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
> > > +
> > > + Assert(rte->rtekind == RTE_RELATION ||
> > > + (rte->rtekind == RTE_SUBQUERY &&
> > > + rte->relid != InvalidOid));
> > > + Assert(CheckRelationOidLockedByMe(rte->relid,
> > > + rte->rellockmode, true));
> > > + }
> > > + }
> > > + }
> > > +#endif
> > >
> > > So the invariant is: if no prep EState was provided, every relation in
> > > the plan is locked; if one was provided, at least the unpruned
> > > relations are locked. Both are checked in assert builds.
> > >
> > > I think this covers the main concerns, but I may be missing something.
> > > If anyone sees a problem with this approach, I'd like to hear about
> > > it.
> >
> > Here's v7. Some plancache.c changes that I'd made were in the wrong
> > patch in v6; this version puts them where they belong.
>
> Attached is an updated set. One more fix: I added an Assert in
> SPI_cursor_open_internal()'s !plan->saved path to verify that
> prep_estates is NIL. Unsaved plans always take the custom plan path,
> so pruning-aware locking never applies, but it's worth guarding
> explicitly since the copyObject/ReleaseCachedPlan sequence that
> follows would not be safe otherwise. Also changed
> SPI_plan_get_cached_plan() to pass NULL for cprep, since it only
> returns the CachedPlan pointer and has no way to deliver prep_estates
> to anyone.
>
> Stepping back -- the core question is whether running executor logic
> (pruning) inside GetCachedPlan() is acceptable at all. The plan cache
> and executor have always had a clean boundary: plan cache locks
> everything, executor runs. This optimization necessarily crosses that
> line, because the information needed to decide which locks to skip
> (pruning results) can only come from executor machinery.
>
> The proposed approach has GetCachedPlan() call ExecutorPrep() to do a
> limited subset of executor work (range table init, permissions,
> pruning), carry the results out through CachedPlanPrepData, and leave
> the CachedPlan itself untouched. The executor already has a multi-step
> protocol: start/run/end. prep/start/run/end is just a finer
> decomposition of what InitPlan() was already doing inside
> ExecutorStart().
>
> Of the attached patches, I'm targeting 0001-0003 for commit. 0004 (SQL
> function support) and 0005 (parallel worker reuse) are useful
> follow-ons but not essential. The optimization works without them for
> most cases, and they can be reviewed and committed separately.
>
> If there's a cleaner way to avoid locking pruned partitions without
> the plumbing this patch adds, I haven't found it in the year since the
> revert. I'd welcome a pointer if you see one. Failing that, I think
> this is the right trade-off, but it's a judgment call about where to
> hold your nose.
>
> Tom, I'd value your opinion on whether this approach is something
> you'd be comfortable seeing in the tree.
Attached is an updated set with some cleanup after another pass.
- Removed ExecCreatePartitionPruneStates() from 0001. In 0001-0003,
ExecDoInitialPruning() handles both setup and pruning internally; the
split isn't needed yet.
- Tightened commit messages to describe what each commit does now, not
what later commits will use it for. In particular, 0002 is upfront
that the portal/SPI/EXPLAIN plumbing is scaffolding that 0003 lights
up.
- Updated setrefs.c comment for firstResultRels to drop a blanket
claim about one ModifyTable per query level.
As before, 0001-0003 is the focus, maybe 0004 which teaches the new
GetCachedPlan() pruning-aware contract to its relatively new user in
function.c.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v9-0004-Make-SQL-function-executor-track-ExecutorPrep-sta.patch (7.8K, 2-v9-0004-Make-SQL-function-executor-track-ExecutorPrep-sta.patch)
download | inline diff:
From 3aedeffabed40d317f1f7e2bb80bce8063429795 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:09:23 +0900
Subject: [PATCH v9 4/5] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 29 +++++++++++++--
src/test/regress/expected/plancache.out | 48 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 34 ++++++++++++++++++
3 files changed, 109 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index c0ca72b38dd..f246f051c25 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -73,6 +73,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ EState *prep_estate; /* EState created in ExecutorPrep() for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -658,6 +659,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
+ ListCell *prep_lc;
/*
* Clean up after previous query, if there was one.
@@ -696,11 +699,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -721,9 +733,11 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ prep_lc = list_head(cprep.prep_estates);
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
+ EState *prep_estate = next_prep_estate(cprep.prep_estates, &prep_lc);
execution_state *newes;
/*
@@ -765,6 +779,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep_estate = prep_estate;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1363,6 +1378,15 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ if (es->prep_estate)
+ MemoryContextSetParent(es->prep_estate->es_query_cxt,
+ fcache->subcontext);
+
es->qd = CreateQueryDesc(es->stmt,
fcache->func->src,
GetActiveSnapshot(),
@@ -1371,7 +1395,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
0,
- NULL);
+ es->prep_estate);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
@@ -1462,6 +1486,7 @@ postquel_end(execution_state *es, SQLFunctionCachePtr fcache)
FreeQueryDesc(es->qd);
es->qd = NULL;
+ es->prep_estate = NULL;
MemoryContextSwitchTo(oldcontext);
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 1d69ab0a1c2..371673a6e96 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -459,4 +459,52 @@ NOTICE: creating index on partition inval_during_pruning_p1
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select * from sqlf_base order by 1;
+ id | val
+----+-----
+ 1 | 30
+(1 row)
+
+select * from sqlf_log order by 1;
+ id | note
+----+----------------
+ 1 | logged by rule
+ 1 | logged by rule
+(2 rows)
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 139b4688fd6..b89c9ad69a4 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -273,4 +273,38 @@ drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
deallocate inval_during_pruning_q;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+select * from sqlf_base order by 1;
+select * from sqlf_log order by 1;
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v9-0005-Reuse-partition-pruning-results-in-parallel-worke.patch (15.8K, 3-v9-0005-Reuse-partition-pruning-results-in-parallel-worke.patch)
download | inline diff:
From ddcbd693f9aa8498c06b4f20fe4df20ff98974c5 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:57 +0900
Subject: [PATCH v9 5/5] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Factor the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. Parallel workers need to set up pruning state without
performing initial pruning, since they receive the leader's results
instead.
Introduce CheckInitialPruningResultsInWorker() (debug-builds only)
to verify that the results match what the worker would compute.
This check helps catch inconsistencies across leader and worker
pruning logic.
---
src/backend/executor/execMain.c | 25 +++++--
src/backend/executor/execParallel.c | 108 ++++++++++++++++++++++++++-
src/backend/executor/execPartition.c | 44 ++++++++---
src/backend/utils/cache/plancache.c | 2 +-
src/include/executor/execPartition.h | 1 +
src/include/executor/executor.h | 3 +-
6 files changed, 161 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 336bd4d09b3..5fa312436fb 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -207,7 +207,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
queryDesc->params,
CurrentResourceOwner,
- eflags);
+ eflags, true);
}
#ifdef USE_ASSERT_CHECKING
else
@@ -330,7 +330,8 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
* ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
*
* Performs range table initialization, permission checks, and initial
- * partition pruning if partPruneInfos are present.
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true; false in a parallel worker.
*
* Returns an EState that the caller must either pass to ExecutorStart()
* for reuse or free via FreeExecutorState() if execution will not proceed.
@@ -341,7 +342,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
EState *
ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
- int eflags)
+ int eflags, bool do_initial_pruning)
{
ResourceOwner oldowner;
EState *estate;
@@ -377,14 +378,22 @@ ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
CurrentResourceOwner = owner;
/*
- * Set up PartitionPruneState structures and perform initial partition
- * pruning to compute the subset of child subplans that will be
- * executed. The results, which are bitmapsets of selected child
- * indexes, are saved in es_part_prune_results, parallel to
+ * Set up PartitionPruneState structures needed for initial
+ * partition pruning.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to
+ * compute the subset of child subplans that will be executed.
+ * The results, which are bitmapsets of selected child indexes,
+ * are saved in es_part_prune_results, parallel to
* es_part_prune_infos. RT indexes of surviving partitions are
* added to es_unpruned_relids.
+ *
+ * Parallel workers pass false here and instead receive the
+ * leader's pruning results via shared memory.
*/
- ExecDoInitialPruning(estate);
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
CurrentResourceOwner = oldowner;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 024780d3516..2de4b35a16e 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -67,6 +68,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -141,6 +144,8 @@ static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
/* Helper function that runs in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static void CheckInitialPruningResultsInWorker(EState *estate);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -620,12 +625,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -654,6 +665,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -680,6 +693,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -781,6 +804,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1280,10 +1313,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ EState *prep_estate = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1296,12 +1334,80 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep_estate = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, 0, false);
+ Assert(prep_estate);
+
+ prep_estate->es_part_prune_results = part_prune_results;
+ prep_estate->es_unpruned_relids =
+ bms_add_members(prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * A debug-build-only check that the pruning results passed from the
+ * leader match what the worker would independently compute.
+ */
+ CheckInitialPruningResultsInWorker(prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options,
- NULL);
+ prep_estate);
+}
+
+/*
+ * CheckInitialPruningResultsInWorker
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ *
+ * Only performed in debug builds.
+ */
+static void
+CheckInitialPruningResultsInWorker(EState *estate)
+{
+#ifdef USE_ASSERT_CHECKING
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i++);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (!bms_equal(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplans in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+#endif
}
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 2a3af006f77..47322614aad 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1942,6 +1942,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1967,15 +1970,40 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
+
/*
* ExecDoInitialPruning
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ *
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
* assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
@@ -1996,18 +2024,12 @@ ExecDoInitialPruning(EState *estate)
ListCell *lc;
Assert(estate->es_part_prune_results == NULL);
- foreach(lc, estate->es_part_prune_infos)
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment. RT indexes
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index bb62c648899..879b2d012a1 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -2102,7 +2102,7 @@ AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
}
prep_estate = ExecutorPrep(plannedstmt, cprep->params,
- cprep->owner, cprep->eflags);
+ cprep->owner, cprep->eflags, true);
Assert(prep_estate);
cprep->prep_estates = lappend(cprep->prep_estates, prep_estate);
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 82063ec2a16..4c96808c376 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 4505ceaca3c..8e5fde965ed 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -240,7 +240,8 @@ extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern EState *ExecutorPrep(PlannedStmt *pstmt,
ParamListInfo params,
ResourceOwner owner,
- int eflags);
+ int eflags,
+ bool do_initial_pruning);
/*
* Walk a prep_estates list in step with a parallel stmt_list iteration.
--
2.47.3
[application/octet-stream] v9-0003-Use-pruning-aware-locking-in-cached-plans.patch (41.8K, 4-v9-0003-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From a5cbee90d2f57c0b775ecc9d959bdcf9fe864075 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 19 Mar 2026 19:02:04 +0900
Subject: [PATCH v9 3/5] Use pruning-aware locking in cached plans
Extend GetCachedPlan() to perform ExecutorPrep() on each planned
statement, capturing unpruned relids and initial pruning results.
Use this data to acquire execution locks only on surviving partitions,
avoiding unnecessary locking of pruned tables even when using cached
plans.
Introduce CachedPlanPrepData to carry the EStates created by
ExecutorPrep() through the plan caching layer. The prep_estates
list is indexed one-to-one with CachedPlan->stmt_list and is
populated when GetCachedPlan() prepares a reused generic plan.
Adjust call sites in SPI, functions, portals, and EXPLAIN to
propagate this data.
Partition pruning expressions may call PL functions that require
an active snapshot (e.g., via EnsurePortalSnapshotExists()).
AcquireExecutorLocksUnpruned() establishes one before calling
ExecutorPrep() if needed, ensuring these expressions can execute
correctly during plan cache validation.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
Regression tests are included to verify:
- Only surviving partitions are locked when pruning is enabled, and
all partitions are locked when it is disabled (pg_locks inspection).
- Multiple ModifyTable nodes (via writable CTEs) handle the case where
all target partitions are pruned, exercising firstResultRels.
- Plan invalidation during pruning-aware lock setup (DDL triggered by
a pruning expression) discards the prep state and replans cleanly.
Note for extension authors: code that accesses partition relations
through EState must check that the RT index is a member of
es_unpruned_relids before opening the relation. Previously this was
an optimization (avoid processing pruned partitions); it is now a
correctness requirement, because pruned partitions may not be locked.
ExecGetRangeTableRelation() already enforces this with an error when
called on a pruned relation.
---
src/backend/commands/prepare.c | 17 +-
src/backend/executor/execMain.c | 4 +
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/executor/spi.c | 22 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 18 ++
src/backend/tcop/postgres.c | 7 +-
src/backend/utils/cache/plancache.c | 257 +++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 35 ++-
src/test/regress/expected/partition_prune.out | 145 ++++++++++
src/test/regress/expected/plancache.out | 62 +++++
src/test/regress/sql/partition_prune.sql | 77 ++++++
src/test/regress/sql/plancache.sql | 51 ++++
16 files changed, 691 insertions(+), 24 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index c7bab14b633..fec83cc6fd4 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -156,6 +156,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -195,7 +196,9 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
/*
@@ -207,7 +210,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NIL,
+ cprep.prep_estates,
cplan);
/*
@@ -577,6 +580,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
List *prep_estates;
ListCell *p;
@@ -635,8 +639,13 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -655,7 +664,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/* Explain each query */
prep_lc = list_head(prep_estates);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 282c9871de0..336bd4d09b3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -334,6 +334,10 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*
* Returns an EState that the caller must either pass to ExecutorStart()
* for reuse or free via FreeExecutorState() if execution will not proceed.
+ * GetCachedPlan() uses this to determine, based on initial pruning
+ * results, which partitions to lock; if the resulting EState is not
+ * delivered to ExecutorStart(), the executor would operate on unlocked
+ * relations. See the assert checks in standard_ExecutorStart().
*/
EState *
ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 952a784c924..c0ca72b38dd 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -699,6 +699,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..9230f2b554f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4865,8 +4865,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -4880,6 +4880,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 380bbc44e97..f1d84f7a350 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1580,6 +1580,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1660,7 +1661,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
if (!plan->saved)
@@ -1670,7 +1674,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
* so must copy the plan into the portal's context. An error here
* will result in leaking our refcount on the plan, but it doesn't
* matter because the plan is unsaved and hence transient anyway.
+ *
+ * Unsaved plans use custom plans, so prep should be a no-op.
*/
+ Assert(cprep.prep_estates == NIL);
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
MemoryContextSwitchTo(oldcontext);
@@ -1686,7 +1693,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NIL,
+ cprep.prep_estates,
cplan);
/*
@@ -2104,7 +2111,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2503,6 +2511,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
ListCell *lc2;
List *prep_estates;
ListCell *prep_lc;
+ CachedPlanPrepData cprep = {0};
spicallbackarg.query = plansource->query_string;
@@ -2577,11 +2586,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
- prep_estates = NIL;
+ prep_estates = cprep.prep_estates;
/*
* If we weren't given a specific snapshot to use, and the statement
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..afa61d357c5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -657,6 +657,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1b5b9b5ed9c..8c9956e687e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,24 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of
+ * initially prunable relations. We use bms_next_member() to get
+ * the lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition
+ * expansion preserves RT index order. ExecInitModifyTable() asserts
+ * that the recorded index matches what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 355a490cde9..de362ff1672 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1637,6 +1637,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2018,7 +2019,9 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
/*
* Now we can define the portal.
@@ -2031,7 +2034,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NIL,
+ cprep.prep_estates,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 698e7c1aa22..bb62c648899 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,14 +93,17 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksAll(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -942,6 +945,11 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
+ * If 'cprep' is not NULL, ExecutorPrep() is applied to each PlannedStmt to
+ * compute the set of partitions that survive initial runtime pruning in order
+ * to only lock them. The EStates created to do so are saved in cprep for
+ * later reuse by ExecutorStart().
+ *
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
@@ -949,7 +957,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -983,7 +991,10 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, true, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1005,7 +1016,13 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, false, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, false);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -1285,6 +1302,15 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a generic plan is reused, the function
+ * performs initial pruning via ExecutorPrep() and locks only the
+ * surviving partitions. The resulting EStates are stored in
+ * cprep->prep_estates and must be delivered to ExecutorStart() via
+ * QueryDesc->estate (or the equivalent portal/SPI path). Failure
+ * to do so means the executor will operate on relations for which
+ * locks were never acquired. Passing NULL for cprep is always safe;
+ * all partitions are locked as before.
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1295,7 +1321,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1317,7 +1344,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (CheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1904,11 +1933,13 @@ QueryListGetPrimaryStmt(List *stmts)
}
/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ * AcquireExecutorLocksAll: acquire locks needed for execution of a cached
+ * plan; or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksAll(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1986,212 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given PlannedStmts.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - appends the EState pointer for each PlannedStmt to cprep->prep_estates
+ *
+ * On release, it:
+ * - looks up the EState for each PlannedStmt from cprep->prep_estates
+ * (which must already be populated)
+ * - unlocks the same relations identified during acquire
+ *
+ * prep_estates is extended during acquire and must match stmt_list one-to-one
+ * when releasing locks. Memory allocation for EState happens in
+ * cprep->context. Locks are acquired using cprep->owner.
+ */
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ List *prep_estates;
+ ListCell *prep_lc;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the EState list (if any) created during
+ * acquisition to determine which relids to unlock. The list must match
+ * the PlannedStmt list one-to-one.
+ */
+ prep_estates = cprep->prep_estates;
+ Assert(acquire || list_length(prep_estates) == list_length(stmt_list));
+
+ prep_lc = list_head(prep_estates);
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+ EState *prep_estate;
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+
+ /* Keep the list one-to-one with stmt_list. */
+ if (acquire)
+ cprep->prep_estates = lappend(cprep->prep_estates, NULL);
+ else
+ (void) next_prep_estate(prep_estates, &prep_lc);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ /*
+ * Pruning expressions may call PL functions that require an active
+ * snapshot (e.g., via EnsurePortalSnapshotExists()). Establish one
+ * if needed.
+ */
+ bool snap_pushed = false;
+
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snap_pushed = true;
+ }
+
+ prep_estate = ExecutorPrep(plannedstmt, cprep->params,
+ cprep->owner, cprep->eflags);
+ Assert(prep_estate);
+ cprep->prep_estates = lappend(cprep->prep_estates, prep_estate);
+
+ if (snap_pushed)
+ PopActiveSnapshot();
+ }
+ else
+ prep_estate = next_prep_estate(prep_estates, &prep_lc);
+
+ if (prep_estate)
+ {
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * We must always include the first result relation of each
+ * ModifyTable node in the plan, that is, the one mentioned in
+ * plannedstmt->firstResultRels in the set of relations to be
+ * locked to satisfy executor assumptions described
+ * in ExecInitModifyTable(). This can be wasteful, because we
+ * may not need to use the first result relation at all if other
+ * result relations are unpruned and thus sufficient for the
+ * ModifyTable node's needs. Unfortunately, we don't have per-node
+ * unpruned_relids set to determine that other result relations
+ * are included.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * CachedPlanPrepCleanup
+ * Clean up EState built for a generic plan.
+ *
+ * This is used in the corner case where CheckCachedPlan() discovers
+ * that a CachedPlan has become invalid after AcquireExecutorLocksUnpruned()
+ * has already run. In that case we must both release the execution locks
+ * and dispose of the ExecPrep list stored in CachedPlanPrepData, since the
+ * executor will never see or clean it up.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ ListCell *lc;
+ ResourceOwner oldowner;
+
+ if (cprep == NULL)
+ return;
+
+ /* Switch to owner that ExecutorPrep() would have used. */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = cprep->owner;
+ foreach(lc, cprep->prep_estates)
+ {
+ EState *prep_estate = (EState *) lfirst(lc);
+
+ if (prep_estate == NULL)
+ continue;
+
+ ExecCloseRangeTableRelations(prep_estate);
+ FreeExecutorState(prep_estate);
+ }
+ CurrentResourceOwner = oldowner;
+
+ list_free(cprep->prep_estates);
+ cprep->prep_estates = NIL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 27758ec16fe..4fd9d9bcc56 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..55279cbbda8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -121,6 +121,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 7a4a85c8038..177150a5848 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -197,6 +197,38 @@ typedef struct CachedExpression
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for each PlannedStmt in a CachedPlan,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_estates is indexed one-to-one with CachedPlan->stmt_list, and is
+ * populated when GetCachedPlan() prepares a reused generic plan. If the
+ * plan is found invalid after locking, the same list is used to determine
+ * which relations to unlock before retrying.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ *
+ * eflags controls ExecutorPrep() behavior during initial pruning.
+ * Normally zero; set EXEC_FLAG_EXPLAIN_GENERIC to suppress pruning
+ * in EXPLAIN (GENERIC_PLAN). Need not match the eflags later passed
+ * to ExecutorStart().
+ *
+ * prep_estates must reach ExecutorStart() to be adopted for execution.
+ * If the plan is invalidated before that happens, CachedPlanPrepCleanup()
+ * frees them instead. The EStates are allocated in 'context' and their
+ * resources tracked under 'owner', which the caller sets to match the
+ * execution environment (e.g., portal context and resowner).
+ */
+typedef struct CachedPlanPrepData
+{
+ List *prep_estates; /* one EState per PlannedStmt, or NULL */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate EState and its fields */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to control ExecutorPrep */
+} CachedPlanPrepData;
+
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +272,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..8e0cc98baca 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4824,3 +4824,148 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+(1 row)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_2
+ Update on prunelock_p1 prunelock_p_3
+ Update on prunelock_p2 prunelock_p_4
+ Update on prunelock_p3 prunelock_p_5
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_3
+ -> Seq Scan on prunelock_p2 prunelock_p_4
+ -> Seq Scan on prunelock_p3 prunelock_p_5
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_6
+ -> Append
+ Subplans Removed: 3
+ -> Append
+ Subplans Removed: 3
+(16 rows)
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+reset plan_cache_mode;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..1d69ab0a1c2 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,65 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..804dd3c8f4e 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1447,3 +1447,80 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..139b4688fd6 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,54 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- Test invalidation of a generic plan during pruning-aware lock setup.
+-- The pruning expression uses a stable SQL function that calls a volatile
+-- plpgsql function. That function performs DDL on a partition when a
+-- separate "signal" table says to do so. The second EXECUTE should
+-- replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- pruning parameter
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+deallocate inval_during_pruning_q;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v9-0001-Refactor-executor-s-initial-partition-pruning-set.patch (7.3K, 5-v9-0001-Refactor-executor-s-initial-partition-pruning-set.patch)
download | inline diff:
From 6b2a9740b49a5238569cfeeb11fa632225ec2cfb Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:38 +0900
Subject: [PATCH v9 1/5] Refactor executor's initial partition pruning setup
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called before PARAM_EXEC parameters are
set up. A later commit needs this when running pruning state setup
outside of InitPlan().
No behavioral change.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++++---------
1 file changed, 48 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..2a3af006f77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -185,8 +185,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1978,7 +1977,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
* and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1996,29 +1995,31 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ Assert(estate->es_part_prune_results == NULL);
foreach(lc, estate->es_part_prune_infos)
{
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
/*
* Perform initial pruning steps, if any, and save the result
- * bitmapset or NULL as described in the header comment.
+ * bitmapset or NULL as described in the header comment. RT indexes
+ * of surviving partitions would be added to validsubplan_rtis.
+ *
+ * Note that when do_initial_prune is false,
+ * CreatePartitionPruneState() would have already added the RT indexes
+ * of all leaf partitions to es_unpruned_relids directly.
*/
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2136,14 +2137,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2377,8 +2376,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2390,9 +2389,28 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
+ }
+ }
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
}
+#endif
}
j++;
@@ -2490,9 +2508,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2520,9 +2539,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
--
2.47.3
[application/octet-stream] v9-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch (25.5K, 6-v9-0002-Introduce-ExecutorPrep-and-refactor-executor-star.patch)
download | inline diff:
From 32267b58bdf9db56a716abde9fcc3e4e8fac6fee Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:07:18 +0900
Subject: [PATCH v9 2/5] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorStart() calls it to build the EState, keeping
behavior unchanged.
If QueryDesc->estate is already set when ExecutorStart() is called,
the existing EState is reused and ExecutorPrep() is skipped. This
allows a later commit to supply a pre-built EState from outside
the executor.
Add scaffolding for carrying an optional prep EState through
CreateQueryDesc, PortalDefineQuery, and SPI. All callers currently
pass NULL/NIL; the next commit populates these to enable
pruning-aware locking in cached plans.
In assert builds, verify that the expected relation locks are held
when entering ExecutorStart().
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 8 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 9 +-
src/backend/executor/README | 11 +-
src/backend/executor/execMain.c | 157 +++++++++++++++++++++++-----
src/backend/executor/execParallel.c | 3 +-
src/backend/executor/functions.c | 3 +-
src/backend/executor/spi.c | 9 +-
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 24 ++++-
src/backend/utils/mmgr/portalmem.c | 2 +
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 5 +-
src/include/executor/executor.h | 26 +++++
src/include/utils/portal.h | 2 +
19 files changed, 223 insertions(+), 50 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..b9bd5ba7078 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1011,7 +1011,7 @@ BeginCopyTo(ParseState *pstate,
cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/*
* Call ExecutorStart to prepare the plan for execution.
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 270e9bf3110..b4a9808955a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -336,7 +336,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e4b70166b0e..24c0c235fd3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -372,7 +372,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -494,7 +494,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -552,7 +553,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* Create a QueryDesc for the query */
queryDesc = CreateQueryDesc(plannedstmt, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ dest, params, queryEnv, instrument_option,
+ prep_estate);
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b98801d08f2..939e7a632f0 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -1174,7 +1174,7 @@ execute_sql_string(const char *sql, const char *filename)
qdesc = CreateQueryDesc(stmt,
sql,
GetActiveSnapshot(), NULL,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..2cdfdcf984b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -439,7 +439,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..1e880a6d7c9 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NIL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..c7bab14b633 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -207,6 +207,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NIL,
cplan);
/*
@@ -577,7 +578,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
const char *query_string;
CachedPlan *cplan;
List *plan_list;
+ List *prep_estates;
ListCell *p;
+ ListCell *prep_lc;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
instr_time planstart;
@@ -652,14 +655,18 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
plan_list = cplan->stmt_list;
+ prep_estates = NIL;
/* Explain each query */
+ prep_lc = list_head(prep_estates);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, prep_estate,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..d749ceb6687 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,18 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Creates EState,
+ performs range table initialization, permission checks, and initial
+ partition pruning. Returns the EState that ExecutorStart() should
+ reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if not already done, indicated by NULL QueryDesc.estate)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..282c9871de0 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -57,6 +57,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -147,7 +148,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -173,9 +173,70 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When
+ * no prep EState was provided, AcquireExecutorLocks() should have
+ * locked every relation in the plan. When one was provided,
+ * pruning-aware locking should have locked at least the unpruned
+ * relations. Both checks are skipped in parallel workers, which
+ * acquire relation locks lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking
+ * should have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -265,6 +326,67 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: prepare executor state for a PlannedStmt outside ExecutorStart.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present.
+ *
+ * Returns an EState that the caller must either pass to ExecutorStart()
+ * for reuse or free via FreeExecutorState() if execution will not proceed.
+ */
+EState *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Caller must have established an active snapshot. */
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Ensure locks taken during initial pruning are tracked under the given
+ * ResourceOwner (e.g., one associated with CachedPlan validation).
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures and perform initial partition
+ * pruning to compute the subset of child subplans that will be
+ * executed. The results, which are bitmapsets of selected child
+ * indexes, are saved in es_part_prune_results, parallel to
+ * es_part_prune_infos. RT indexes of surviving partitions are
+ * added to es_unpruned_relids.
+ */
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ return estate;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -840,37 +962,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..024780d3516 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1300,7 +1300,8 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
- receiver, paramLI, NULL, instrument_options);
+ receiver, paramLI, NULL, instrument_options,
+ NULL);
}
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..952a784c924 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1369,7 +1369,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest,
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
- 0);
+ 0,
+ NULL);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..380bbc44e97 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1686,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NIL,
cplan);
/*
@@ -2500,6 +2501,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ List *prep_estates;
+ ListCell *prep_lc;
spicallbackarg.query = plansource->query_string;
@@ -2578,6 +2581,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
plan_owner, _SPI_current->queryEnv);
stmt_list = cplan->stmt_list;
+ prep_estates = NIL;
/*
* If we weren't given a specific snapshot to use, and the statement
@@ -2615,9 +2619,11 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ prep_lc = list_head(prep_estates);
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
+ EState *prep_estate = next_prep_estate(prep_estates, &prep_lc);
bool canSetTag = stmt->canSetTag;
DestReceiver *dest;
@@ -2695,7 +2701,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
dest,
options->params,
_SPI_current->queryEnv,
- 0);
+ 0,
+ prep_estate);
res = _SPI_pquery(qdesc, fire_triggers,
canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index b3563113219..355a490cde9 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1231,6 +1231,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NIL,
NULL);
/*
@@ -2030,6 +2031,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NIL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index d8fc75d0bb9..b18266487bb 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -72,7 +73,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options)
+ int instrument_options,
+ EState *prep_estate)
{
QueryDesc *qd = palloc_object(QueryDesc);
@@ -93,6 +95,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->planstate = NULL;
qd->totaltime = NULL;
+ /* Use the EState created by ExecutorPrep() if already done. */
+ qd->estate = prep_estate;
+
/* not yet executed */
qd->already_executed = false;
@@ -123,6 +128,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep_estate: EState created in ExecutorPrep() for the query, if any
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +141,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -148,7 +155,8 @@ ProcessQuery(PlannedStmt *plan,
*/
queryDesc = CreateQueryDesc(plan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0,
+ prep_estate);
/*
* Call ExecutorStart to prepare the plan for execution
@@ -495,7 +503,10 @@ PortalStart(Portal portal, ParamListInfo params,
None_Receiver,
params,
portal->queryEnv,
- 0);
+ 0,
+ portal->prep_estates ?
+ (EState *) linitial(portal->prep_estates) :
+ NULL);
/*
* If it's a scrollable cursor, executor needs to support
@@ -1185,6 +1196,7 @@ PortalRunMulti(Portal portal,
{
bool active_snapshot_set = false;
ListCell *stmtlist_item;
+ ListCell *prep_lc;
/*
* If the destination is DestRemoteExecute, change to DestNone. The
@@ -1205,9 +1217,11 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ prep_lc = list_head(portal->prep_estates);
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
+ EState *prep_estate = next_prep_estate(portal->prep_estates, &prep_lc);
/*
* If we got a cancel signal in prior command, quit
@@ -1265,7 +1279,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1288,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 493f9b0ee19..443b583637c 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -286,6 +286,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -299,6 +300,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->prep_estates = prep_estates;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 472e141bba3..71ebe38bc86 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -64,7 +64,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index d3a57242844..3a2169c9613 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -43,7 +43,7 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
@@ -63,7 +63,8 @@ extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options);
+ int instrument_options,
+ EState *prep_estate);
extern void FreeQueryDesc(QueryDesc *qdesc);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..4505ceaca3c 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -21,6 +21,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -235,6 +236,31 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern EState *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ int eflags);
+
+/*
+ * Walk a prep_estates list in step with a parallel stmt_list iteration.
+ * Returns the next EState (or NULL) and advances *lc.
+ *
+ * Safe when prep_estates is NIL; just returns NULL for every call.
+ */
+static inline EState *
+next_prep_estate(List *prep_estates, ListCell **lc)
+{
+ EState *result = NULL;
+
+ if (*lc != NULL)
+ {
+ result = (EState *) lfirst(*lc);
+ *lc = lnext(prep_estates, *lc);
+ }
+ return result;
+}
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..f69b4b9b479 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ List *prep_estates; /* list of EStates where needed */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ List *prep_estates,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-03-26 09:24 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2026-03-26 09:24 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Wed, Mar 25, 2026 at 4:39 PM Amit Langote <[email protected]> wrote:
> On Fri, Mar 20, 2026 at 2:20 AM Amit Langote <[email protected]> wrote:
> > On Mon, Mar 9, 2026 at 1:41 PM Amit Langote <[email protected]> wrote:
> > Stepping back -- the core question is whether running executor logic
> > (pruning) inside GetCachedPlan() is acceptable at all. The plan cache
> > and executor have always had a clean boundary: plan cache locks
> > everything, executor runs. This optimization necessarily crosses that
> > line, because the information needed to decide which locks to skip
> > (pruning results) can only come from executor machinery.
> >
> > The proposed approach has GetCachedPlan() call ExecutorPrep() to do a
> > limited subset of executor work (range table init, permissions,
> > pruning), carry the results out through CachedPlanPrepData, and leave
> > the CachedPlan itself untouched. The executor already has a multi-step
> > protocol: start/run/end. prep/start/run/end is just a finer
> > decomposition of what InitPlan() was already doing inside
> > ExecutorStart().
> >
> > Of the attached patches, I'm targeting 0001-0003 for commit. 0004 (SQL
> > function support) and 0005 (parallel worker reuse) are useful
> > follow-ons but not essential. The optimization works without them for
> > most cases, and they can be reviewed and committed separately.
> >
> > If there's a cleaner way to avoid locking pruned partitions without
> > the plumbing this patch adds, I haven't found it in the year since the
> > revert. I'd welcome a pointer if you see one. Failing that, I think
> > this is the right trade-off, but it's a judgment call about where to
> > hold your nose.
> >
> > Tom, I'd value your opinion on whether this approach is something
> > you'd be comfortable seeing in the tree.
>
> Attached is an updated set with some cleanup after another pass.
>
> - Removed ExecCreatePartitionPruneStates() from 0001. In 0001-0003,
> ExecDoInitialPruning() handles both setup and pruning internally; the
> split isn't needed yet.
>
> - Tightened commit messages to describe what each commit does now, not
> what later commits will use it for. In particular, 0002 is upfront
> that the portal/SPI/EXPLAIN plumbing is scaffolding that 0003 lights
> up.
>
> - Updated setrefs.c comment for firstResultRels to drop a blanket
> claim about one ModifyTable per query level.
>
> As before, 0001-0003 is the focus, maybe 0004 which teaches the new
> GetCachedPlan() pruning-aware contract to its relatively new user in
> function.c.
While reviewing the patch more carefully, I realized there's a
correctness issue when rule rewriting causes a single statement to
expand into multiple PlannedStmts in one CachedPlan.
PortalRunMulti() executes those statements sequentially, with
CommandCounterIncrement() between them, so Q2's ExecutorStart()
normally sees the effects of Q1.
With the patch, though, AcquireExecutorLocksUnpruned() runs
ExecutorPrep() on all PlannedStmts in one pass during GetCachedPlan(),
before any statement executes. If a later statement has
initial-pruning expressions that read data modified by an earlier one,
pruning can see stale results.
There's also a memory lifetime issue: PortalRunMulti() calls
MemoryContextDeleteChildren(portalContext) between statements, which
destroys EStates prepared for later statements.
Here's a concrete case demonstrating the semantic issue:
create table multistmt_pt (a int, b int) partition by list (a);
create table multistmt_pt_1 partition of multistmt_pt for values in (1);
create table multistmt_pt_2 partition of multistmt_pt for values in (2);
insert into multistmt_pt values (1, 0), (2, 0);
create table prune_config (val int);
insert into prune_config values (1);
create function get_prune_val() returns int as $$
select val from prune_config;
$$ language sql stable;
-- rule action runs first, updating prune_config before the
-- original statement's pruning would normally be evaluated
create rule config_upd_rule as on update to multistmt_pt
do also update prune_config set val = 2;
set plan_cache_mode to force_generic_plan;
prepare multi_q as
update multistmt_pt set b = b + 1 where a = get_prune_val();
execute multi_q; -- creates the generic plan
-- reset for the real test
update prune_config set val = 1;
update multistmt_pt set b = 0;
-- second execute reuses the plan
execute multi_q;
select * from multistmt_pt order by a;
Without the patch: the rule action updates prune_config to val=2
first, then after CCI the original statement's initial pruning calls
get_prune_val(), gets 2, prunes to multistmt_pt_2, and updates it
correctly: (1, 0), (2, 1).
With the patch as it stood: both statements' pruning runs during
GetCachedPlan() before either executes. The original statement's
pruning sees val=1, prunes to multistmt_pt_1, and multistmt_pt_2 is
never touched.
The fix is to skip pruning-aware locking for CachedPlans containing
multiple PlannedStmts, falling back to locking all partitions.
Single-statement plans are unchanged.
Since multi-statement plans are now excluded, CachedPlanPrepData no
longer needs a list of EStates -- it carries a single EState pointer.
This simplifies the plumbing throughout: PortalData,
PortalDefineQuery, SPI, and EXPLAIN all pass a single optional EState
instead of walking parallel lists. The next_prep_estate() helper is
gone.
Attached is the updated set.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v10-0005-Reuse-partition-pruning-results-in-parallel-work.patch (15.8K, 2-v10-0005-Reuse-partition-pruning-results-in-parallel-work.patch)
download | inline diff:
From 33fff6e090d9c713413a68ef2bdf9721f7e7f95b Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:57 +0900
Subject: [PATCH v10 5/5] Reuse partition pruning results in parallel workers
Pass the leader's initial partition pruning results and unpruned
relids to parallel workers and reuse them via ExecutorPrep(). This
avoids repeating pruning logic in workers, which is not only
redundant but also risks divergence due to nondeterminism in pruning
steps or parameter evaluation timing.
Factor the creation of PartitionPruneState structures out of
ExecDoInitialPruning() into a new ExecCreatePartitionPruneStates()
function. Parallel workers need to set up pruning state without
performing initial pruning, since they receive the leader's results
instead.
Introduce CheckInitialPruningResultsInWorker() (debug-builds only)
to verify that the results match what the worker would compute.
This check helps catch inconsistencies across leader and worker
pruning logic.
---
src/backend/executor/execMain.c | 25 +++++--
src/backend/executor/execParallel.c | 108 ++++++++++++++++++++++++++-
src/backend/executor/execPartition.c | 44 ++++++++---
src/backend/utils/cache/plancache.c | 2 +-
src/include/executor/execPartition.h | 1 +
src/include/executor/executor.h | 3 +-
6 files changed, 161 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 051b5d7bfcf..659557189ce 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -207,7 +207,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
queryDesc->params,
CurrentResourceOwner,
- eflags);
+ eflags, true);
}
#ifdef USE_ASSERT_CHECKING
else
@@ -330,7 +330,8 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
* ExecutorPrep: build initial executor state for a PlannedStmt.
*
* Performs range table initialization, permission checks, and initial
- * partition pruning if partPruneInfos are present.
+ * partition pruning if partPruneInfos are present and do_initial_pruning is
+ * true; false in a parallel worker.
*
* Returns an EState that the caller must either pass to ExecutorStart()
* for reuse or free via FreeExecutorState() if execution will not proceed.
@@ -341,7 +342,7 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*/
EState *
ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
- int eflags)
+ int eflags, bool do_initial_pruning)
{
ResourceOwner oldowner;
EState *estate;
@@ -378,14 +379,22 @@ ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
CurrentResourceOwner = owner;
/*
- * Set up PartitionPruneState structures and perform initial partition
- * pruning to compute the subset of child subplans that will be
- * executed. The results, which are bitmapsets of selected child
- * indexes, are saved in es_part_prune_results, parallel to
+ * Set up PartitionPruneState structures needed for initial
+ * partition pruning.
+ *
+ * If do_initial_pruning is true, also perform initial pruning to
+ * compute the subset of child subplans that will be executed.
+ * The results, which are bitmapsets of selected child indexes,
+ * are saved in es_part_prune_results, parallel to
* es_part_prune_infos. RT indexes of surviving partitions are
* added to es_unpruned_relids.
+ *
+ * Parallel workers pass false here and instead receive the
+ * leader's pruning results via shared memory.
*/
- ExecDoInitialPruning(estate);
+ ExecCreatePartitionPruneStates(estate);
+ if (do_initial_pruning)
+ ExecDoInitialPruning(estate);
CurrentResourceOwner = oldowner;
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 024780d3516..2de4b35a16e 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -24,6 +24,7 @@
#include "postgres.h"
#include "executor/execParallel.h"
+#include "executor/execPartition.h"
#include "executor/executor.h"
#include "executor/nodeAgg.h"
#include "executor/nodeAppend.h"
@@ -67,6 +68,8 @@
#define PARALLEL_KEY_QUERY_TEXT UINT64CONST(0xE000000000000008)
#define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE00000000000000A)
+#define PARALLEL_KEY_PARTITION_PRUNE_RESULTS UINT64CONST(0xE00000000000000B)
+#define PARALLEL_KEY_UNPRUNED_RELIDS UINT64CONST(0xE00000000000000C)
#define PARALLEL_TUPLE_QUEUE_SIZE 65536
@@ -141,6 +144,8 @@ static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
/* Helper function that runs in the parallel worker. */
static DestReceiver *ExecParallelGetReceiver(dsm_segment *seg, shm_toc *toc);
+static void CheckInitialPruningResultsInWorker(EState *estate);
+
/*
* Create a serialized representation of the plan to be sent to each worker.
*/
@@ -620,12 +625,18 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
FixedParallelExecutorState *fpes;
char *pstmt_data;
char *pstmt_space;
+ char *part_prune_results_data;
+ char *part_prune_results_space;
+ char *unpruned_relids_data;
+ char *unpruned_relids_space;
char *paramlistinfo_space;
BufferUsage *bufusage_space;
WalUsage *walusage_space;
SharedExecutorInstrumentation *instrumentation = NULL;
SharedJitInstrumentation *jit_instrumentation = NULL;
int pstmt_len;
+ int part_prune_results_len;
+ int unpruned_relids_len;
int paramlistinfo_len;
int instrumentation_len = 0;
int jit_instrumentation_len = 0;
@@ -654,6 +665,8 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
/* Fix up and serialize plan to be sent to workers. */
pstmt_data = ExecSerializePlan(planstate->plan, estate);
+ part_prune_results_data = nodeToString(estate->es_part_prune_results);
+ unpruned_relids_data = nodeToString(estate->es_unpruned_relids);
/* Create a parallel context. */
pcxt = CreateParallelContext("postgres", "ParallelQueryMain", nworkers);
@@ -680,6 +693,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
shm_toc_estimate_chunk(&pcxt->estimator, pstmt_len);
shm_toc_estimate_keys(&pcxt->estimator, 1);
+ /* Estimate space for serialized part_prune_results. */
+ part_prune_results_len = strlen(part_prune_results_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, part_prune_results_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
+ /* Estimate space for serialized unpruned_relids. */
+ unpruned_relids_len = strlen(unpruned_relids_data) + 1;
+ shm_toc_estimate_chunk(&pcxt->estimator, unpruned_relids_len);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+
/* Estimate space for serialized ParamListInfo. */
paramlistinfo_len = EstimateParamListSpace(estate->es_param_list_info);
shm_toc_estimate_chunk(&pcxt->estimator, paramlistinfo_len);
@@ -781,6 +804,16 @@ ExecInitParallelPlan(PlanState *planstate, EState *estate,
memcpy(pstmt_space, pstmt_data, pstmt_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PLANNEDSTMT, pstmt_space);
+ /* Store serialized part_prune_results */
+ part_prune_results_space = shm_toc_allocate(pcxt->toc, part_prune_results_len);
+ memcpy(part_prune_results_space, part_prune_results_data, part_prune_results_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, part_prune_results_space);
+
+ /* Store serialized unpruned_relids */
+ unpruned_relids_space = shm_toc_allocate(pcxt->toc, unpruned_relids_len);
+ memcpy(unpruned_relids_space, unpruned_relids_data, unpruned_relids_len);
+ shm_toc_insert(pcxt->toc, PARALLEL_KEY_UNPRUNED_RELIDS, unpruned_relids_space);
+
/* Store serialized ParamListInfo. */
paramlistinfo_space = shm_toc_allocate(pcxt->toc, paramlistinfo_len);
shm_toc_insert(pcxt->toc, PARALLEL_KEY_PARAMLISTINFO, paramlistinfo_space);
@@ -1280,10 +1313,15 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
int instrument_options)
{
char *pstmtspace;
+ char *part_prune_results_space;
+ char *unpruned_relids_space;
char *paramspace;
PlannedStmt *pstmt;
+ List *part_prune_results;
+ Bitmapset *unpruned_relids;
ParamListInfo paramLI;
char *queryString;
+ EState *prep_estate = NULL;
/* Get the query string from shared memory */
queryString = shm_toc_lookup(toc, PARALLEL_KEY_QUERY_TEXT, false);
@@ -1296,12 +1334,80 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
paramspace = shm_toc_lookup(toc, PARALLEL_KEY_PARAMLISTINFO, false);
paramLI = RestoreParamList(¶mspace);
+ /* Reconstruct leader-supplied part_prune_results and unpruned_relids. */
+ part_prune_results_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_PARTITION_PRUNE_RESULTS, false);
+ part_prune_results = (List *) stringToNode(part_prune_results_space);
+ unpruned_relids_space =
+ shm_toc_lookup(toc, PARALLEL_KEY_UNPRUNED_RELIDS, false);
+ unpruned_relids = (Bitmapset *) stringToNode(unpruned_relids_space);
+
+ /*
+ * If pruning was done in the leader, build a prep estate in the worker
+ * and inject the leader's pruning results into it for reuse.
+ */
+ if (pstmt->partPruneInfos)
+ {
+ prep_estate = ExecutorPrep(pstmt, paramLI, CurrentResourceOwner, 0, false);
+ Assert(prep_estate);
+
+ prep_estate->es_part_prune_results = part_prune_results;
+ prep_estate->es_unpruned_relids =
+ bms_add_members(prep_estate->es_unpruned_relids,
+ unpruned_relids);
+
+ /*
+ * A debug-build-only check that the pruning results passed from the
+ * leader match what the worker would independently compute.
+ */
+ CheckInitialPruningResultsInWorker(prep_estate);
+ }
+
/* Create a QueryDesc for the query. */
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
receiver, paramLI, NULL, instrument_options,
- NULL);
+ prep_estate);
+}
+
+/*
+ * CheckInitialPruningResultsInWorker
+ * Verify partition pruning results passed from the leader process.
+ *
+ * This is intended to be called during parallel worker query setup.
+ * It recomputes initial pruning results locally and compares them with
+ * those received from the leader. Any mismatch may indicate a divergence
+ * between leader and worker logic or environment.
+ *
+ * Only performed in debug builds.
+ */
+static void
+CheckInitialPruningResultsInWorker(EState *estate)
+{
+#ifdef USE_ASSERT_CHECKING
+ ListCell *lc;
+ int i;
+
+ Assert(estate->es_part_prune_results != NULL);
+ i = 0;
+ foreach(lc, estate->es_part_prune_states)
+ {
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
+ Bitmapset *reuse_validsubplans =
+ list_nth_node(Bitmapset, estate->es_part_prune_results, i++);
+ Bitmapset *validsubplans = NULL;
+ Bitmapset *validsubplan_rtis = NULL;
+
+ if (prunestate->do_initial_prune)
+ validsubplans = ExecFindMatchingSubPlans(prunestate, true,
+ &validsubplan_rtis);
+ if (!bms_equal(validsubplans, reuse_validsubplans))
+ elog(ERROR, "different validsubplans in parallel worker");
+ if (bms_nonempty_difference(validsubplan_rtis, estate->es_unpruned_relids))
+ elog(ERROR, "different unprunable_relids in parallel worker");
+ }
+#endif
}
/*
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index 2a3af006f77..47322614aad 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1942,6 +1942,9 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*
* Functions:
*
+ * ExecCreatePartitionPruneStates
+ * Create PartitionPruneState for all PartitionPruneInfos in the EState
+ *
* ExecDoInitialPruning:
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
@@ -1967,15 +1970,40 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
*/
+/*
+ * ExecCreatePartitionPruneStates
+ *
+ * Create a PartitionPruneState for each PartitionPruneInfo in the estate,
+ * and save them in estate->es_part_prune_states. This setup is required
+ * before any initial or runtime pruning can occur.
+ */
+void
+ExecCreatePartitionPruneStates(EState *estate)
+{
+ ListCell *lc;
+
+ foreach(lc, estate->es_part_prune_infos)
+ {
+ PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
+ PartitionPruneState *prunestate;
+
+ /* Create and save the PartitionPruneState. */
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
+ estate->es_part_prune_states = lappend(estate->es_part_prune_states,
+ prunestate);
+ }
+}
+
/*
* ExecDoInitialPruning
* Perform runtime "initial" pruning, if necessary, to determine the set
* of child subnodes that need to be initialized during ExecInitNode() for
* plan nodes that support partition pruning.
*
- * This function iterates over each PartitionPruneInfo entry in
- * estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
- * and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
+ *
+ * This function iterates over each PartitionPruneState in
+ * estate->es_part_prune_states, which must have been populated earlier by
+ * ExecCreatePartitionPruneStates(). ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
* assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
@@ -1996,18 +2024,12 @@ ExecDoInitialPruning(EState *estate)
ListCell *lc;
Assert(estate->es_part_prune_results == NULL);
- foreach(lc, estate->es_part_prune_infos)
+ foreach(lc, estate->es_part_prune_states)
{
- PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
- PartitionPruneState *prunestate;
+ PartitionPruneState *prunestate = (PartitionPruneState *) lfirst(lc);
Bitmapset *validsubplans = NULL;
Bitmapset *validsubplan_rtis = NULL;
- /* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo);
- estate->es_part_prune_states = lappend(estate->es_part_prune_states,
- prunestate);
-
/*
* Perform initial pruning steps, if any, and save the result
* bitmapset or NULL as described in the header comment. RT indexes
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index b0c4d62564d..6c178c461a7 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -2100,7 +2100,7 @@ AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
}
prep_estate = ExecutorPrep(plannedstmt, cprep->params,
- cprep->owner, cprep->eflags);
+ cprep->owner, cprep->eflags, true);
Assert(prep_estate);
cprep->prep_estate = prep_estate;
diff --git a/src/include/executor/execPartition.h b/src/include/executor/execPartition.h
index 82063ec2a16..4c96808c376 100644
--- a/src/include/executor/execPartition.h
+++ b/src/include/executor/execPartition.h
@@ -130,6 +130,7 @@ typedef struct PartitionPruneState
PartitionPruningData *partprunedata[FLEXIBLE_ARRAY_MEMBER];
} PartitionPruneState;
+extern void ExecCreatePartitionPruneStates(EState *estate);
extern void ExecDoInitialPruning(EState *estate);
extern PartitionPruneState *ExecInitPartitionExecPruning(PlanState *planstate,
int n_total_subplans,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index fac5bef1384..37195312bce 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -240,7 +240,8 @@ extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
extern EState *ExecutorPrep(PlannedStmt *pstmt,
ParamListInfo params,
ResourceOwner owner,
- int eflags);
+ int eflags,
+ bool do_initial_pruning);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
--
2.47.3
[application/octet-stream] v10-0001-Refactor-executor-s-initial-partition-pruning-se.patch (7.3K, 3-v10-0001-Refactor-executor-s-initial-partition-pruning-se.patch)
download | inline diff:
From 6b2a9740b49a5238569cfeeb11fa632225ec2cfb Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:38 +0900
Subject: [PATCH v10 1/5] Refactor executor's initial partition pruning setup
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called before PARAM_EXEC parameters are
set up. A later commit needs this when running pruning state setup
outside of InitPlan().
No behavioral change.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++++---------
1 file changed, 48 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..2a3af006f77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -185,8 +185,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1978,7 +1977,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
* and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1996,29 +1995,31 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ Assert(estate->es_part_prune_results == NULL);
foreach(lc, estate->es_part_prune_infos)
{
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
/*
* Perform initial pruning steps, if any, and save the result
- * bitmapset or NULL as described in the header comment.
+ * bitmapset or NULL as described in the header comment. RT indexes
+ * of surviving partitions would be added to validsubplan_rtis.
+ *
+ * Note that when do_initial_prune is false,
+ * CreatePartitionPruneState() would have already added the RT indexes
+ * of all leaf partitions to es_unpruned_relids directly.
*/
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2136,14 +2137,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2377,8 +2376,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2390,9 +2389,28 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
+ }
+ }
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
}
+#endif
}
j++;
@@ -2490,9 +2508,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2520,9 +2539,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
--
2.47.3
[application/octet-stream] v10-0002-Introduce-ExecutorPrep-and-refactor-executor-sta.patch (23.5K, 4-v10-0002-Introduce-ExecutorPrep-and-refactor-executor-sta.patch)
download | inline diff:
From 4e849ce0af12963ee2040f187f4cb0bad1c2851e Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 26 Mar 2026 16:08:46 +0900
Subject: [PATCH v10 2/5] Introduce ExecutorPrep and refactor executor startup
Factor permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper. ExecutorStart() calls it to build the EState, keeping
behavior unchanged.
If QueryDesc->estate is already set when ExecutorStart() is called,
the existing EState is reused and ExecutorPrep() is skipped. This
allows a later commit to supply a pre-built EState from outside
the executor.
Add scaffolding for carrying an optional prep EState through
CreateQueryDesc, PortalDefineQuery, and SPI. All callers currently
pass NULL; the next commit populates these to enable pruning-aware
locking in cached plans.
In assert builds, verify that the expected relation locks are held
when entering ExecutorStart().
---
src/backend/commands/copyto.c | 2 +-
src/backend/commands/createas.c | 2 +-
src/backend/commands/explain.c | 8 +-
src/backend/commands/extension.c | 2 +-
src/backend/commands/matview.c | 2 +-
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 4 +-
src/backend/executor/README | 11 +-
src/backend/executor/execMain.c | 158 +++++++++++++++++++++++-----
src/backend/executor/execParallel.c | 3 +-
src/backend/executor/functions.c | 3 +-
src/backend/executor/spi.c | 4 +-
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 19 +++-
src/backend/utils/mmgr/portalmem.c | 7 ++
src/include/commands/explain.h | 3 +-
src/include/executor/execdesc.h | 5 +-
src/include/executor/executor.h | 7 ++
src/include/utils/portal.h | 2 +
19 files changed, 195 insertions(+), 50 deletions(-)
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index faf62d959b4..b9bd5ba7078 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -1011,7 +1011,7 @@ BeginCopyTo(ParseState *pstate,
cstate->queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(),
InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/*
* Call ExecutorStart to prepare the plan for execution.
diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index 270e9bf3110..b4a9808955a 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -336,7 +336,7 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, pstate->p_sourcetext,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, GetIntoRelEFlags(into));
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e4b70166b0e..24c0c235fd3 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -372,7 +372,7 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
}
/* run it (if needed) and produce output */
- ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
+ ExplainOnePlan(plan, NULL, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
}
@@ -494,7 +494,8 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
* to call it.
*/
void
-ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
+ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
@@ -552,7 +553,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
/* Create a QueryDesc for the query */
queryDesc = CreateQueryDesc(plannedstmt, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ dest, params, queryEnv, instrument_option,
+ prep_estate);
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index b98801d08f2..939e7a632f0 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -1174,7 +1174,7 @@ execute_sql_string(const char *sql, const char *filename)
qdesc = CreateQueryDesc(stmt,
sql,
GetActiveSnapshot(), NULL,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
ExecutorStart(qdesc, 0);
ExecutorRun(qdesc, ForwardScanDirection, 0);
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..2cdfdcf984b 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -439,7 +439,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
/* Create a QueryDesc, redirecting output to our tuple receiver */
queryDesc = CreateQueryDesc(plan, queryString,
GetActiveSnapshot(), InvalidSnapshot,
- dest, NULL, NULL, 0);
+ dest, NULL, NULL, 0, NULL);
/* call ExecutorStart to prepare the plan for execution */
ExecutorStart(queryDesc, 0);
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..cf5deec4943 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..c24d97f7e5a 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -207,6 +207,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ NULL,
cplan);
/*
@@ -659,7 +660,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
+ ExplainOnePlan(pstmt, NULL,
+ into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
else
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..d749ceb6687 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,18 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart (e.g., for plan validation), or
+ implicitly from ExecutorStart if not done earlier. Creates EState,
+ performs range table initialization, permission checks, and initial
+ partition pruning. Returns the EState that ExecutorStart() should
+ reuse.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if not already done, indicated by NULL QueryDesc.estate)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 58b84955c2b..cc7794f58db 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -57,6 +57,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -147,7 +148,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -173,9 +173,70 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When
+ * no prep EState was provided, AcquireExecutorLocks() should have
+ * locked every relation in the plan. When one was provided,
+ * pruning-aware locking should have locked at least the unpruned
+ * relations. Both checks are skipped in parallel workers, which
+ * acquire relation locks lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ queryDesc->estate = ExecutorPrep(queryDesc->plannedstmt,
+ queryDesc->params,
+ CurrentResourceOwner,
+ eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking
+ * should have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -265,6 +326,68 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep: build initial executor state for a PlannedStmt.
+ *
+ * Performs range table initialization, permission checks, and initial
+ * partition pruning if partPruneInfos are present.
+ *
+ * Returns an EState that the caller must either pass to ExecutorStart()
+ * for reuse or free via FreeExecutorState() if execution will not proceed.
+ */
+EState *
+ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
+ int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+
+ if (pstmt->commandType == CMD_UTILITY)
+ return NULL;
+
+ /* Caller must have established an active snapshot. */
+ Assert(ActiveSnapshotSet());
+
+ estate = CreateExecutorState();
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = params;
+ estate->es_top_eflags = eflags;
+
+ /*
+ * Do permissions checks.
+ */
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ /*
+ * Initialize range table.
+ */
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ /*
+ * Track resources acquired during pruning under the given
+ * ResourceOwner, which may differ from CurrentResourceOwner
+ * when ExecutorPrep() is called outside ExecutorStart().
+ */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ /*
+ * Set up PartitionPruneState structures and perform initial partition
+ * pruning to compute the subset of child subplans that will be
+ * executed. The results, which are bitmapsets of selected child
+ * indexes, are saved in es_part_prune_results, parallel to
+ * es_part_prune_infos. RT indexes of surviving partitions are
+ * added to es_unpruned_relids.
+ */
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+
+ return estate;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -840,37 +963,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ac84af294c9..024780d3516 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1300,7 +1300,8 @@ ExecParallelGetQueryDesc(shm_toc *toc, DestReceiver *receiver,
return CreateQueryDesc(pstmt,
queryString,
GetActiveSnapshot(), InvalidSnapshot,
- receiver, paramLI, NULL, instrument_options);
+ receiver, paramLI, NULL, instrument_options,
+ NULL);
}
/*
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..952a784c924 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -1369,7 +1369,8 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
dest,
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
- 0);
+ 0,
+ NULL);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..32c9d987c59 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1686,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ NULL,
cplan);
/*
@@ -2695,7 +2696,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
dest,
options->params,
_SPI_current->queryEnv,
- 0);
+ 0,
+ NULL);
res = _SPI_pquery(qdesc, fire_triggers,
canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index b3563113219..ccdb6c01071 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1231,6 +1231,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2030,6 +2031,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ NULL,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index d8fc75d0bb9..42ef3e82f82 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -37,6 +37,7 @@ Portal ActivePortal = NULL;
static void ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -72,7 +73,8 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options)
+ int instrument_options,
+ EState *prep_estate)
{
QueryDesc *qd = palloc_object(QueryDesc);
@@ -93,6 +95,9 @@ CreateQueryDesc(PlannedStmt *plannedstmt,
qd->planstate = NULL;
qd->totaltime = NULL;
+ /* Use the EState created by ExecutorPrep() if already done. */
+ qd->estate = prep_estate;
+
/* not yet executed */
qd->already_executed = false;
@@ -123,6 +128,7 @@ FreeQueryDesc(QueryDesc *qdesc)
* PORTAL_ONE_RETURNING, or PORTAL_ONE_MOD_WITH portal
*
* plan: the plan tree for the query
+ * prep_estate: EState created in ExecutorPrep() for the query, if any
* sourceText: the source text of the query
* params: any parameters needed
* dest: where to send results
@@ -135,6 +141,7 @@ FreeQueryDesc(QueryDesc *qdesc)
*/
static void
ProcessQuery(PlannedStmt *plan,
+ EState *prep_estate,
const char *sourceText,
ParamListInfo params,
QueryEnvironment *queryEnv,
@@ -148,7 +155,8 @@ ProcessQuery(PlannedStmt *plan,
*/
queryDesc = CreateQueryDesc(plan, sourceText,
GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, 0);
+ dest, params, queryEnv, 0,
+ prep_estate);
/*
* Call ExecutorStart to prepare the plan for execution
@@ -495,7 +503,8 @@ PortalStart(Portal portal, ParamListInfo params,
None_Receiver,
params,
portal->queryEnv,
- 0);
+ 0,
+ portal->prep_estate);
/*
* If it's a scrollable cursor, executor needs to support
@@ -1265,7 +1274,7 @@ PortalRunMulti(Portal portal,
if (pstmt->canSetTag)
{
/* statement can set tag string */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, portal->prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
@@ -1274,7 +1283,7 @@ PortalRunMulti(Portal portal,
else
{
/* stmt added by rewrite cannot set tag */
- ProcessQuery(pstmt,
+ ProcessQuery(pstmt, portal->prep_estate,
portal->sourceText,
portal->portalParams,
portal->queryEnv,
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 493f9b0ee19..0ecda763d21 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -272,6 +272,11 @@ CreateNewPortal(void)
* the passed plan trees have adequate lifetime. Typically this is done by
* copying them into the portal's context.
*
+ * If prep_estate is not NULL, it is an EState created by ExecutorPrep()
+ * during GetCachedPlan(). It will be passed to ExecutorStart() to avoid
+ * redoing range table setup and pruning. The portal takes ownership;
+ * the EState must have been allocated in the portal's memory context.
+ *
* The caller is also responsible for ensuring that the passed prepStmtName
* (if not NULL) and sourceText have adequate lifetime.
*
@@ -286,6 +291,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ EState *prep_estate,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -299,6 +305,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->prep_estate = prep_estate;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 472e141bba3..71ebe38bc86 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -64,7 +64,8 @@ extern void ExplainOneUtility(Node *utilityStmt, IntoClause *into,
ExplainState *es, ParseState *pstate,
ParamListInfo params);
-extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
+extern void ExplainOnePlan(PlannedStmt *plannedstmt, EState *prep_estate,
+ IntoClause *into,
ExplainState *es, const char *queryString,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index d3a57242844..3a2169c9613 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -43,7 +43,7 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
@@ -63,7 +63,8 @@ extern QueryDesc *CreateQueryDesc(PlannedStmt *plannedstmt,
DestReceiver *dest,
ParamListInfo params,
QueryEnvironment *queryEnv,
- int instrument_options);
+ int instrument_options,
+ EState *prep_estate);
extern void FreeQueryDesc(QueryDesc *qdesc);
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 07f4b1f7490..fac5bef1384 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -21,6 +21,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -235,6 +236,12 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+
+extern EState *ExecutorPrep(PlannedStmt *pstmt,
+ ParamListInfo params,
+ ResourceOwner owner,
+ int eflags);
+
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..a59e96fa11e 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,7 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ EState *prep_estate; /* EState from ExecutorPrep() if any */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +241,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ EState *prep_estate,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v10-0003-Use-pruning-aware-locking-in-cached-plans.patch (47.3K, 5-v10-0003-Use-pruning-aware-locking-in-cached-plans.patch)
download | inline diff:
From 648b9f5c89069692bbb46cf579576be50a9147f2 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 26 Mar 2026 18:15:39 +0900
Subject: [PATCH v10 3/5] Use pruning-aware locking in cached plans
Extend GetCachedPlan()'s lock acquisition to perform initial
partition pruning via ExecutorPrep(), then lock only the surviving
partitions. This avoids unnecessary locking of pruned partitions
when reusing a generic cached plan.
Introduce CachedPlanPrepData to carry the EState created by
ExecutorPrep() through the plan caching layer. The prep_estate
field is populated when GetCachedPlan() prepares a reused
single-statement generic plan. Adjust call sites in SPI,
portals, and EXPLAIN to propagate this to ExecutorStart().
Disable pruning-aware locking for multi-statement CachedPlans, which
arise from rule rewriting. PortalRunMulti() executes such statements
sequentially with CommandCounterIncrement() between them, so later
statements' pruning expressions may see different results depending
on when they are evaluated. Evaluating all statements' pruning
upfront during GetCachedPlan() would produce stale results for later
statements. Additionally, PortalRunMulti() calls
MemoryContextDeleteChildren(portalContext) between statements, which
would destroy EStates prepared for later statements. The fallback
to locking all partitions is safe and sufficient here; multi-statement
plans from rule rewriting are uncommon.
Partition pruning expressions may call PL functions that require
an active snapshot (e.g., via EnsurePortalSnapshotExists()).
AcquireExecutorLocksUnpruned() establishes one before calling
ExecutorPrep() if needed, ensuring these expressions can execute
correctly during plan cache validation.
To maintain correctness when all target partitions are pruned, also
reinstate the firstResultRel locking behavior lost in commit
28317de72. That commit required the first ModifyTable target to
remain initialized for executor assumptions to hold. We now
explicitly track these relids in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving that rule across cached plan
reuse.
Regression tests are included to verify:
- Only surviving partitions are locked when pruning is enabled, and
all partitions are locked when it is disabled (pg_locks inspection).
- Multiple ModifyTable nodes (via writable CTEs) handle the case where
all target partitions are pruned, exercising firstResultRels.
- Plan invalidation during pruning-aware lock setup (DDL triggered by
a pruning expression) discards the prep state and replans cleanly.
- Multi-statement CachedPlans (from rule rewriting) fall back to
locking all partitions, avoiding stale pruning and use-after-free.
Note for extension authors: code that accesses partition relations
through EState must check that the RT index is a member of
es_unpruned_relids before opening the relation. Previously this was
an optimization (avoid processing pruned partitions); it is now a
correctness requirement, because pruned partitions may not be locked.
ExecGetRangeTableRelation() already enforces this with an error when
called on a pruned relation.
---
src/backend/commands/prepare.c | 19 +-
src/backend/executor/execMain.c | 4 +
src/backend/executor/functions.c | 1 +
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/executor/spi.c | 24 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 18 ++
src/backend/tcop/postgres.c | 8 +-
src/backend/tcop/pquery.c | 1 +
src/backend/utils/cache/plancache.c | 246 +++++++++++++++++-
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 38 ++-
src/test/regress/expected/partition_prune.out | 184 +++++++++++++
src/test/regress/expected/plancache.out | 63 +++++
src/test/regress/sql/partition_prune.sql | 116 +++++++++
src/test/regress/sql/plancache.sql | 52 ++++
17 files changed, 769 insertions(+), 24 deletions(-)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index c24d97f7e5a..621fd30fd5e 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -156,6 +156,7 @@ ExecuteQuery(ParseState *pstate,
{
PreparedStatement *entry;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ParamListInfo paramLI = NULL;
EState *estate = NULL;
@@ -195,8 +196,11 @@ ExecuteQuery(ParseState *pstate,
entry->plansource->query_string);
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL, &cprep);
plan_list = cplan->stmt_list;
+ Assert(cprep.prep_estate == NULL || list_length(plan_list) == 1);
/*
* DO NOT add any logic that could possibly throw an error between
@@ -207,7 +211,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
- NULL,
+ cprep.prep_estate,
cplan);
/*
@@ -577,6 +581,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
PreparedStatement *entry;
const char *query_string;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *plan_list;
ListCell *p;
ParamListInfo paramLI = NULL;
@@ -633,8 +638,13 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
+ if (es->generic)
+ cprep.eflags = EXEC_FLAG_EXPLAIN_GENERIC;
cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ CurrentResourceOwner, pstate->p_queryEnv,
+ &cprep);
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
@@ -655,12 +665,13 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
+ Assert(cprep.prep_estate == NULL || list_length(plan_list) == 1);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
if (pstmt->commandType != CMD_UTILITY)
- ExplainOnePlan(pstmt, NULL,
+ ExplainOnePlan(pstmt, cprep.prep_estate,
into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
es->memory ? &mem_counters : NULL);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index cc7794f58db..051b5d7bfcf 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -334,6 +334,10 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
*
* Returns an EState that the caller must either pass to ExecutorStart()
* for reuse or free via FreeExecutorState() if execution will not proceed.
+ * GetCachedPlan() uses this to determine, based on initial pruning
+ * results, which partitions to lock; if the resulting EState is not
+ * delivered to ExecutorStart(), the executor would operate on unlocked
+ * relations. See the assert checks in standard_ExecutorStart().
*/
EState *
ExecutorPrep(PlannedStmt *pstmt, ParamListInfo params, ResourceOwner owner,
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 952a784c924..c0ca72b38dd 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -699,6 +699,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
+ NULL,
NULL);
/*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 4cd5e262e0f..9230f2b554f 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -4865,8 +4865,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -4880,6 +4880,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 32c9d987c59..eb9552f85db 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1580,6 +1580,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
{
CachedPlanSource *plansource;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
List *stmt_list;
char *query_string;
Snapshot snapshot;
@@ -1660,8 +1661,12 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
*/
/* Replan if needed, and increment plan refcount for portal */
- cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
+ Assert(cprep.prep_estate == NULL || list_length(stmt_list) == 1);
if (!plan->saved)
{
@@ -1670,7 +1675,10 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
* so must copy the plan into the portal's context. An error here
* will result in leaking our refcount on the plan, but it doesn't
* matter because the plan is unsaved and hence transient anyway.
+ *
+ * Unsaved plans use custom plans, so prep should be a no-op.
*/
+ Assert(cprep.prep_estate == NULL);
oldcontext = MemoryContextSwitchTo(portal->portalContext);
stmt_list = copyObject(stmt_list);
MemoryContextSwitchTo(oldcontext);
@@ -1686,7 +1694,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
- NULL,
+ cprep.prep_estate,
cplan);
/*
@@ -2104,7 +2112,8 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
/* Get the generic plan for the query */
cplan = GetCachedPlan(plansource, NULL,
plan->saved ? CurrentResourceOwner : NULL,
- _SPI_current->queryEnv);
+ _SPI_current->queryEnv,
+ NULL);
Assert(cplan == plansource->gplan);
/* Pop the error context stack */
@@ -2501,6 +2510,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
CachedPlanSource *plansource = (CachedPlanSource *) lfirst(lc1);
List *stmt_list;
ListCell *lc2;
+ CachedPlanPrepData cprep = {0};
spicallbackarg.query = plansource->query_string;
@@ -2575,8 +2585,11 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
+ cprep.context = CurrentMemoryContext;
+ cprep.owner = CurrentResourceOwner;
cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
+ plan_owner, _SPI_current->queryEnv,
+ &cprep);
stmt_list = cplan->stmt_list;
@@ -2616,6 +2629,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
}
}
+ Assert(cprep.prep_estate == NULL || list_length(stmt_list) == 1);
foreach(lc2, stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc2);
@@ -2697,7 +2711,7 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
options->params,
_SPI_current->queryEnv,
0,
- NULL);
+ cprep.prep_estate);
res = _SPI_pquery(qdesc, fire_triggers,
canSetTag ? options->tcount : 0);
FreeQueryDesc(qdesc);
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 42604a0f75c..afa61d357c5 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -657,6 +657,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
result->resultRelations = glob->resultRelations;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index 1b5b9b5ed9c..8c9956e687e 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,24 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of
+ * initially prunable relations. We use bms_next_member() to get
+ * the lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition
+ * expansion preserves RT index order. ExecInitModifyTable() asserts
+ * that the recorded index matches what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index ccdb6c01071..487258641a5 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1637,6 +1637,7 @@ exec_bind_message(StringInfo input_message)
int16 *rformats = NULL;
CachedPlanSource *psrc;
CachedPlan *cplan;
+ CachedPlanPrepData cprep = {0};
Portal portal;
char *query_string;
char *saved_stmt_name;
@@ -2018,7 +2019,10 @@ exec_bind_message(StringInfo input_message)
* will be generated in MessageContext. The plan refcount will be
* assigned to the Portal, so it will be released at portal destruction.
*/
- cplan = GetCachedPlan(psrc, params, NULL, NULL);
+ cprep.context = portal->portalContext;
+ cprep.owner = portal->resowner;
+ cplan = GetCachedPlan(psrc, params, NULL, NULL, &cprep);
+ Assert(cprep.prep_estate == NULL || list_length(cplan->stmt_list) == 1);
/*
* Now we can define the portal.
@@ -2031,7 +2035,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
- NULL,
+ cprep.prep_estate,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 42ef3e82f82..b52c4c619ee 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -1214,6 +1214,7 @@ PortalRunMulti(Portal portal,
* Loop to handle the individual queries generated from a single parsetree
* by analysis and rewrite.
*/
+ Assert(portal->prep_estate == NULL || list_length(portal->stmts) == 1);
foreach(stmtlist_item, portal->stmts)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, stmtlist_item);
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 698e7c1aa22..b0c4d62564d 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -93,14 +93,17 @@ static bool StmtPlanRequiresRevalidation(CachedPlanSource *plansource);
static bool BuildingPlanRequiresSnapshot(CachedPlanSource *plansource);
static List *RevalidateCachedQuery(CachedPlanSource *plansource,
QueryEnvironment *queryEnv);
-static bool CheckCachedPlan(CachedPlanSource *plansource);
+static bool CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep);
static CachedPlan *BuildCachedPlan(CachedPlanSource *plansource, List *qlist,
ParamListInfo boundParams, QueryEnvironment *queryEnv);
static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksAll(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep);
+static void CachedPlanPrepCleanup(CachedPlanPrepData *cprep);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -942,6 +945,12 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
/*
* CheckCachedPlan: see if the CachedPlanSource's generic plan is valid.
*
+ * If 'cprep' is not NULL and the generic plan contains only a single
+ * statement, ExecutorPrep() is applied to that PlannedStmt to compute the set
+ * of partitions that survive initial runtime pruning in order to only lock
+ * them. The EState is saved in cprep.prep_estate, which must be passed to
+ * ExecutorStart() for reuse.
+ *
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
@@ -949,7 +958,7 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* (We must do this for the "true" result to be race-condition-free.)
*/
static bool
-CheckCachedPlan(CachedPlanSource *plansource)
+CheckCachedPlan(CachedPlanSource *plansource, CachedPlanPrepData *cprep)
{
CachedPlan *plan = plansource->gplan;
@@ -983,7 +992,19 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
+ /*
+ * Multi-statement CachedPlans (from rule rewriting) must not
+ * use pruning-aware locking, because later statements' pruning
+ * expressions could see stale results if evaluated before
+ * earlier statements have executed.
+ */
+ if (cprep && list_length(plan->stmt_list) > 1)
+ cprep = NULL;
+
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, true, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, true);
/*
* If plan was transient, check to see if TransactionXmin has
@@ -1005,7 +1026,13 @@ CheckCachedPlan(CachedPlanSource *plansource)
}
/* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
+ if (cprep)
+ AcquireExecutorLocksUnpruned(plan->stmt_list, false, cprep);
+ else
+ AcquireExecutorLocksAll(plan->stmt_list, false);
+
+ /* Also clean up ExecutorPrep() state, if necessary. */
+ CachedPlanPrepCleanup(cprep);
}
/*
@@ -1285,6 +1312,16 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* On return, the plan is valid and we have sufficient locks to begin
* execution.
*
+ * If 'cprep' is not NULL and a single-statement generic plan is reused,
+ * the function performs initial pruning via ExecutorPrep() and locks only
+ * the surviving partitions. The resulting EState is stored in
+ * cprep->prep_estate and must be delivered to ExecutorStart() via
+ * QueryDesc->estate (or the equivalent portal/SPI path). Failure
+ * to do so means the executor will operate on relations for which
+ * locks were never acquired. Passing NULL for cprep is always safe;
+ * all partitions are locked as before. Multi-statement plans also
+ * fall back to locking all partitions.
+ *
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
* the refcount has been reported to that ResourceOwner (note that this
@@ -1295,7 +1332,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
*/
CachedPlan *
GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
- ResourceOwner owner, QueryEnvironment *queryEnv)
+ ResourceOwner owner, QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep)
{
CachedPlan *plan = NULL;
List *qlist;
@@ -1317,7 +1355,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
if (!customplan)
{
- if (CheckCachedPlan(plansource))
+ if (cprep)
+ cprep->params = boundParams;
+ if (CheckCachedPlan(plansource, cprep))
{
/* We want a generic plan, and we already have a valid one */
plan = plansource->gplan;
@@ -1904,11 +1944,13 @@ QueryListGetPrimaryStmt(List *stmts)
}
/*
- * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
- * or release them if acquire is false.
+ * AcquireExecutorLocksAll: acquire locks needed for execution of a cached
+ * plan; or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksAll(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1997,190 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * LockRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not
+ * fail if it's been dropped entirely --- we'll just transiently
+ * acquire a non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksUnpruned
+ * Acquire or release execution locks for only unpruned relations
+ * referenced by the given single-statement PlannedStmt list.
+ *
+ * On acquire, this:
+ * - locks unprunable rels listed in PlannedStmt.unprunableRelids
+ * - runs ExecutorPrep() to perform initial runtime pruning
+ * - locks the surviving partitions reported in the prep estate
+ * - stores the EState in cprep->prep_estate
+ *
+ * On release, it:
+ * - uses the EState in cprep->prep_estate to determine which
+ * relids to unlock
+ *
+ * Memory allocation for the EState happens in cprep->context.
+ * Locks are acquired using cprep->owner.
+ */
+static void
+AcquireExecutorLocksUnpruned(List *stmt_list, bool acquire,
+ CachedPlanPrepData *cprep)
+{
+ MemoryContext oldcontext = MemoryContextSwitchTo(cprep->context);
+ ListCell *lc1;
+ EState *prep_estate;
+
+ Assert(cprep);
+
+ /*
+ * When releasing locks, use the EState created during acquisition to
+ * determine which relids to unlock.
+ */
+ prep_estate = cprep->prep_estate;
+ Assert(!acquire || prep_estate == NULL);
+ foreach(lc1, stmt_list)
+ {
+ PlannedStmt *plannedstmt = lfirst_node(PlannedStmt, lc1);
+
+ if (plannedstmt->commandType == CMD_UTILITY)
+ {
+ /* Same as AcquireExecutorLocks(). */
+ Query *query = UtilityContainsQuery(plannedstmt->utilityStmt);
+
+ if (query)
+ ScanQueryForLocks(query, acquire);
+ continue;
+ }
+
+ /*
+ * Lock tables mentioned in the original query and other unprunable
+ * relations that were added to the plan via inheritance expansion.
+ */
+ LockRelids(plannedstmt->rtable, plannedstmt->unprunableRelids, acquire);
+
+ /* Lock partitions surviving runtime initial pruning. */
+ if (acquire)
+ {
+ /*
+ * Pruning expressions may call PL functions that require an active
+ * snapshot (e.g., via EnsurePortalSnapshotExists()). Establish one
+ * if needed.
+ */
+ bool snap_pushed = false;
+
+ if (!ActiveSnapshotSet())
+ {
+ PushActiveSnapshot(GetTransactionSnapshot());
+ snap_pushed = true;
+ }
+
+ prep_estate = ExecutorPrep(plannedstmt, cprep->params,
+ cprep->owner, cprep->eflags);
+ Assert(prep_estate);
+ cprep->prep_estate = prep_estate;
+
+ if (snap_pushed)
+ PopActiveSnapshot();
+ }
+
+ if (prep_estate)
+ {
+ /*
+ * es_unpruned_relids includes plannedstmt->unprunableRelids,
+ * which we've already locked. Filter them out to avoid double-locking.
+ */
+ Bitmapset *lock_relids = bms_difference(prep_estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * We must always include the first result relation of each
+ * ModifyTable node in the plan, that is, the one mentioned in
+ * plannedstmt->firstResultRels in the set of relations to be
+ * locked to satisfy executor assumptions described
+ * in ExecInitModifyTable(). This can be wasteful, because we
+ * may not need to use the first result relation at all if other
+ * result relations are unpruned and thus sufficient for the
+ * ModifyTable node's needs. Unfortunately, we don't have per-node
+ * unpruned_relids set to determine that other result relations
+ * are included.
+ */
+ if (plannedstmt->resultRelations)
+ {
+ ListCell *lc2;
+
+ foreach(lc2, plannedstmt->firstResultRels)
+ {
+ Index firstResultRel = lfirst_int(lc2);
+
+ if (!bms_is_member(firstResultRel, lock_relids))
+ lock_relids = bms_add_member(lock_relids, firstResultRel);
+ }
+ }
+
+ LockRelids(plannedstmt->rtable, lock_relids, acquire);
+ bms_free(lock_relids);
+ }
+ }
+
+ MemoryContextSwitchTo(oldcontext);
+}
+
+/*
+ * CachedPlanPrepCleanup
+ * Dispose of EState built during pruning-aware lock acquisition.
+ *
+ * This is used when CheckCachedPlan() discovers that a CachedPlan has
+ * become invalid after AcquireExecutorLocksUnpruned() has already run.
+ * The execution locks have already been released by that point; this
+ * function frees the EState that the executor will never see.
+ */
+static void
+CachedPlanPrepCleanup(CachedPlanPrepData *cprep)
+{
+ EState *prep_estate;
+ ResourceOwner oldowner;
+
+ if (cprep == NULL)
+ return;
+
+ /* Switch to owner that ExecutorPrep() would have used. */
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = cprep->owner;
+
+ prep_estate = cprep->prep_estate;
+ Assert(prep_estate);
+ ExecCloseRangeTableRelations(prep_estate);
+ FreeExecutorState(prep_estate);
+ CurrentResourceOwner = oldowner;
+
+ cprep->prep_estate = NULL;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 27758ec16fe..4fd9d9bcc56 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index b6185825fcb..55279cbbda8 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -121,6 +121,16 @@ typedef struct PlannedStmt
/* integer list of RT indexes, or NIL */
List *resultRelations;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 7a4a85c8038..1a153b816eb 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -27,6 +27,9 @@
typedef struct Query Query;
typedef struct RawStmt RawStmt;
+/* to avoid including execnodes.h */
+typedef struct EState EState;
+
/* possible values for plan_cache_mode */
typedef enum
{
@@ -196,6 +199,38 @@ typedef struct CachedExpression
dlist_node node; /* link in global list of CachedExpressions */
} CachedExpression;
+/*
+ * CachedPlanPrepData
+ * Carries ExecutorPrep results for a CachedPlan's PlannedStmt,
+ * along with context and owner information needed to allocate them.
+ *
+ * prep_estate is populated when GetCachedPlan() prepares a reused
+ * single-statement generic plan. Multi-statement plans (from rule
+ * rewriting) fall back to locking all partitions and leave this NULL.
+ * If the plan is found invalid after locking, the EState is freed
+ * by CachedPlanPrepCleanup() before retrying.
+ *
+ * ExecutorPrep state is allocated in 'context' and owned by 'owner'.
+ *
+ * eflags controls ExecutorPrep() behavior during initial pruning.
+ * Normally zero; set EXEC_FLAG_EXPLAIN_GENERIC to suppress pruning
+ * in EXPLAIN (GENERIC_PLAN). Need not match the eflags later passed
+ * to ExecutorStart().
+ *
+ * prep_estate must reach ExecutorStart() to be adopted for execution.
+ * If the plan is invalidated before that happens, CachedPlanPrepCleanup()
+ * frees it instead. The EState is allocated in 'context' and its
+ * resources tracked under 'owner', which the caller sets to match the
+ * execution environment (e.g., portal context and resowner).
+ */
+typedef struct CachedPlanPrepData
+{
+ EState *prep_estate; /* EState for the PlannedStmt */
+ ParamListInfo params; /* params visible to ExecutorPrep */
+ MemoryContext context; /* where to allocate EState and its fields */
+ ResourceOwner owner; /* ResourceOwner for ExecutorPrep state */
+ int eflags; /* executor flags to control ExecutorPrep */
+} CachedPlanPrepData;
extern void InitPlanCache(void);
extern void ResetPlanCache(void);
@@ -240,7 +275,8 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource,
extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
- QueryEnvironment *queryEnv);
+ QueryEnvironment *queryEnv,
+ CachedPlanPrepData *cprep);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..61781389d2f 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4824,3 +4824,187 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+(1 row)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_2
+ Update on prunelock_p1 prunelock_p_3
+ Update on prunelock_p2 prunelock_p_4
+ Update on prunelock_p3 prunelock_p_5
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_3
+ -> Seq Scan on prunelock_p2 prunelock_p_4
+ -> Seq Scan on prunelock_p3 prunelock_p_5
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_6
+ -> Append
+ Subplans Removed: 3
+ -> Append
+ Subplans Removed: 3
+(16 rows)
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+ a | b
+---+---
+ 1 | 0
+ 2 | 1
+(2 rows)
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..3043dbfac2d 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,66 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..692415a8d9f 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1447,3 +1447,119 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..6a8b8787de6 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,55 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v10-0004-Make-SQL-function-executor-track-ExecutorPrep-st.patch (7.7K, 6-v10-0004-Make-SQL-function-executor-track-ExecutorPrep-st.patch)
download | inline diff:
From 5769f6ca7c9ffcee1b51d27105c780c5d6102f55 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 10 Feb 2026 22:09:23 +0900
Subject: [PATCH v10 4/5] Make SQL function executor track ExecutorPrep state
Extend the SQL function executor to use the ExecutorPrep results
returned by GetCachedPlan(). init_execution_state() now passes a
CachedPlanPrepData to GetCachedPlan() and stores the per statement
ExecPrep pointers in the execution_state nodes.
At execution time, postquel_start() reparents the prep estate's
es_query_cxt under the function's subcontext so that prep state
follows the usual per call context hierarchy.
This allows SQL language functions to participate in the same
ExecutorPrep machinery as other plan cache users.
Add a regression test where rule rewrite expands a single UPDATE
into multiple PlannedStmts, exercising the SQL function plan cache
and the generic plan reuse path that now invokes ExecutorPrep.
---
src/backend/executor/functions.c | 27 ++++++++++++--
src/test/regress/expected/plancache.out | 48 +++++++++++++++++++++++++
src/test/regress/sql/plancache.sql | 34 ++++++++++++++++++
3 files changed, 107 insertions(+), 2 deletions(-)
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index c0ca72b38dd..2be816b6a75 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -73,6 +73,7 @@ typedef struct execution_state
bool setsResult; /* true if this query produces func's result */
bool lazyEval; /* true if should fetch one row at a time */
PlannedStmt *stmt; /* plan for this query */
+ EState *prep_estate; /* EState created in ExecutorPrep() for this plan */
QueryDesc *qd; /* null unless status == RUN */
} execution_state;
@@ -658,6 +659,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
execution_state *lasttages = NULL;
int nstmts;
ListCell *lc;
+ CachedPlanPrepData cprep = {0};
/*
* Clean up after previous query, if there was one.
@@ -696,11 +698,20 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
+
+ /*
+ * Have ExecutorPrep() allocate under fcache->fcontext. The prep
+ * EStates it creates will initially live there; postquel_start()
+ * will later reparent their es_query_cxt into fcache->subcontext
+ * when using them for execution.
+ */
+ cprep.context = fcache->fcontext;
+ cprep.owner = fcache->cowner;
fcache->cplan = GetCachedPlan(plansource,
fcache->paramLI,
fcache->cowner,
NULL,
- NULL);
+ &cprep);
/*
* If necessary, make esarray[] bigger to hold the needed state.
@@ -721,6 +732,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
/*
* Build execution_state list to match the number of contained plans.
*/
+ Assert(cprep.prep_estate == NULL || list_length(fcache->cplan->stmt_list) == 1);
foreach(lc, fcache->cplan->stmt_list)
{
PlannedStmt *stmt = lfirst_node(PlannedStmt, lc);
@@ -765,6 +777,7 @@ init_execution_state(SQLFunctionCachePtr fcache)
newes->setsResult = false; /* might change below */
newes->lazyEval = false; /* might change below */
newes->stmt = stmt;
+ newes->prep_estate = cprep.prep_estate;
newes->qd = NULL;
if (stmt->canSetTag)
@@ -1363,6 +1376,15 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
else
dest = None_Receiver;
+ /*
+ * Prep EStates were built under fcache->fcontext. For execution,
+ * make their es_query_cxt a child of fcache->subcontext so they
+ * follow the usual per call lifetime.
+ */
+ if (es->prep_estate)
+ MemoryContextSetParent(es->prep_estate->es_query_cxt,
+ fcache->subcontext);
+
es->qd = CreateQueryDesc(es->stmt,
fcache->func->src,
GetActiveSnapshot(),
@@ -1371,7 +1393,7 @@ postquel_start(execution_state *es, SQLFunctionCachePtr fcache)
fcache->paramLI,
es->qd ? es->qd->queryEnv : NULL,
0,
- NULL);
+ es->prep_estate);
/* Utility commands don't need Executor. */
if (es->qd->operation != CMD_UTILITY)
@@ -1462,6 +1484,7 @@ postquel_end(execution_state *es, SQLFunctionCachePtr fcache)
FreeQueryDesc(es->qd);
es->qd = NULL;
+ es->prep_estate = NULL;
MemoryContextSwitchTo(oldcontext);
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 3043dbfac2d..547846b2945 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -460,4 +460,52 @@ NOTICE: creating index on partition inval_during_pruning_p1
deallocate inval_during_pruning_q;
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+set plan_cache_mode = force_generic_plan;
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+insert into sqlf_base values (1, 10);
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+select sqlf_execprep_test(1, 20);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select sqlf_execprep_test(1, 30);
+ sqlf_execprep_test
+--------------------
+
+(1 row)
+
+select * from sqlf_base order by 1;
+ id | val
+----+-----
+ 1 | 30
+(1 row)
+
+select * from sqlf_log order by 1;
+ id | note
+----+----------------
+ 1 | logged by rule
+ 1 | logged by rule
+(2 rows)
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 6a8b8787de6..532fa58518b 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -274,4 +274,38 @@ deallocate inval_during_pruning_q;
drop table inval_during_pruning_p, inval_during_pruning_signal;
drop function invalidate_plancache_func, stable_pruning_val;
+-- exercise sql-function plan cache when rewrite expands a single statement
+-- into multiple planned statements. this forces cachedplan->stmt_list to
+-- contain more than one entry and checks that executor state for the first
+-- rewritten statement does not destroy state needed by the second one.
+
+set plan_cache_mode = force_generic_plan;
+
+create table sqlf_base(id int, val int) partition by list (id);
+create table sqlf_base_1 partition of sqlf_base for values in (1);
+create table sqlf_base_2 partition of sqlf_base for values in (2);
+create table sqlf_log(id int, note text);
+
+insert into sqlf_base values (1, 10);
+
+create rule sqlf_base_upd_log as
+on update to sqlf_base do also
+ insert into sqlf_log(id, note)
+ values (new.id, 'logged by rule');
+
+create or replace function sqlf_execprep_test(a int, v int)
+returns void
+language sql
+as $$
+ update sqlf_base set val = v where id = a;
+$$;
+
+select sqlf_execprep_test(1, 20);
+select sqlf_execprep_test(1, 30);
+select * from sqlf_base order by 1;
+select * from sqlf_log order by 1;
+
+drop rule sqlf_base_upd_log on sqlf_base;
+drop table sqlf_base, sqlf_log;
+drop function sqlf_execprep_test;
reset plan_cache_mode;
--
2.47.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-03-27 09:00 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2026-03-27 09:00 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
On Thu, Mar 26, 2026 at 6:24 PM Amit Langote <[email protected]> wrote:
> On Wed, Mar 25, 2026 at 4:39 PM Amit Langote <[email protected]> wrote:
> > On Fri, Mar 20, 2026 at 2:20 AM Amit Langote <[email protected]> wrote:
> > > On Mon, Mar 9, 2026 at 1:41 PM Amit Langote <[email protected]> wrote:
> > > Stepping back -- the core question is whether running executor logic
> > > (pruning) inside GetCachedPlan() is acceptable at all. The plan cache
> > > and executor have always had a clean boundary: plan cache locks
> > > everything, executor runs. This optimization necessarily crosses that
> > > line, because the information needed to decide which locks to skip
> > > (pruning results) can only come from executor machinery.
> > >
> > > The proposed approach has GetCachedPlan() call ExecutorPrep() to do a
> > > limited subset of executor work (range table init, permissions,
> > > pruning), carry the results out through CachedPlanPrepData, and leave
> > > the CachedPlan itself untouched. The executor already has a multi-step
> > > protocol: start/run/end. prep/start/run/end is just a finer
> > > decomposition of what InitPlan() was already doing inside
> > > ExecutorStart().
> > >
> > > Of the attached patches, I'm targeting 0001-0003 for commit. 0004 (SQL
> > > function support) and 0005 (parallel worker reuse) are useful
> > > follow-ons but not essential. The optimization works without them for
> > > most cases, and they can be reviewed and committed separately.
> > >
> > > If there's a cleaner way to avoid locking pruned partitions without
> > > the plumbing this patch adds, I haven't found it in the year since the
> > > revert. I'd welcome a pointer if you see one. Failing that, I think
> > > this is the right trade-off, but it's a judgment call about where to
> > > hold your nose.
> > >
> > > Tom, I'd value your opinion on whether this approach is something
> > > you'd be comfortable seeing in the tree.
> >
> > Attached is an updated set with some cleanup after another pass.
> >
> > - Removed ExecCreatePartitionPruneStates() from 0001. In 0001-0003,
> > ExecDoInitialPruning() handles both setup and pruning internally; the
> > split isn't needed yet.
> >
> > - Tightened commit messages to describe what each commit does now, not
> > what later commits will use it for. In particular, 0002 is upfront
> > that the portal/SPI/EXPLAIN plumbing is scaffolding that 0003 lights
> > up.
> >
> > - Updated setrefs.c comment for firstResultRels to drop a blanket
> > claim about one ModifyTable per query level.
> >
> > As before, 0001-0003 is the focus, maybe 0004 which teaches the new
> > GetCachedPlan() pruning-aware contract to its relatively new user in
> > function.c.
>
> While reviewing the patch more carefully, I realized there's a
> correctness issue when rule rewriting causes a single statement to
> expand into multiple PlannedStmts in one CachedPlan.
>
> PortalRunMulti() executes those statements sequentially, with
> CommandCounterIncrement() between them, so Q2's ExecutorStart()
> normally sees the effects of Q1.
>
> With the patch, though, AcquireExecutorLocksUnpruned() runs
> ExecutorPrep() on all PlannedStmts in one pass during GetCachedPlan(),
> before any statement executes. If a later statement has
> initial-pruning expressions that read data modified by an earlier one,
> pruning can see stale results.
>
> There's also a memory lifetime issue: PortalRunMulti() calls
> MemoryContextDeleteChildren(portalContext) between statements, which
> destroys EStates prepared for later statements.
>
> Here's a concrete case demonstrating the semantic issue:
>
> create table multistmt_pt (a int, b int) partition by list (a);
> create table multistmt_pt_1 partition of multistmt_pt for values in (1);
> create table multistmt_pt_2 partition of multistmt_pt for values in (2);
> insert into multistmt_pt values (1, 0), (2, 0);
>
> create table prune_config (val int);
> insert into prune_config values (1);
>
> create function get_prune_val() returns int as $$
> select val from prune_config;
> $$ language sql stable;
>
> -- rule action runs first, updating prune_config before the
> -- original statement's pruning would normally be evaluated
> create rule config_upd_rule as on update to multistmt_pt
> do also update prune_config set val = 2;
>
> set plan_cache_mode to force_generic_plan;
> prepare multi_q as
> update multistmt_pt set b = b + 1 where a = get_prune_val();
> execute multi_q; -- creates the generic plan
>
> -- reset for the real test
> update prune_config set val = 1;
> update multistmt_pt set b = 0;
>
> -- second execute reuses the plan
> execute multi_q;
> select * from multistmt_pt order by a;
>
> Without the patch: the rule action updates prune_config to val=2
> first, then after CCI the original statement's initial pruning calls
> get_prune_val(), gets 2, prunes to multistmt_pt_2, and updates it
> correctly: (1, 0), (2, 1).
>
> With the patch as it stood: both statements' pruning runs during
> GetCachedPlan() before either executes. The original statement's
> pruning sees val=1, prunes to multistmt_pt_1, and multistmt_pt_2 is
> never touched.
>
> The fix is to skip pruning-aware locking for CachedPlans containing
> multiple PlannedStmts, falling back to locking all partitions.
> Single-statement plans are unchanged.
For good measure, I also verified that Tom's test case from last May
[1] that prompted the revert of the previous commit works correctly
with this patch. When the DO ALSO rule is created mid-execution, the
plan gets invalidated and rebuilt as a multi-statement CachedPlan,
which triggers the fallback to locking all partitions. No assertions,
no crashes.
--
Thanks, Amit Langote
[1] https://postgr.es/m/[email protected]
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-04-04 12:10 Amit Langote <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2026-04-04 12:10 UTC (permalink / raw)
To: Chao Li <[email protected]>; +Cc: Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers; Thom Brown <[email protected]>
Attached is a redesigned version. While working on the previous
design, I grew increasingly uncomfortable with CachedPlanPrepData --
it was smuggling executor state out of GetCachedPlan() through an
out-parameter, which papered over the real problem: GetCachedPlan()
was doing too much. The main change in this version is architectural:
GetCachedPlan() no longer acquires execution locks. Callers now own
that responsibility, which is natural because each call site iterates
stmt_list differently and manages execution state in its own way --
and it lets them choose between conservative lock-all and
pruning-aware locking where appropriate.
Non-portal call sites remain on the conservative path for now.
_SPI_execute_plan requires care around snapshot setup, which happens
after plan fetch rather than before. SQL functions have a different
issue: init_execution_state() fetches the plan while postquel_start()
handles execution, with execution_state containers in between, making
it harder to thread a prepped QueryDesc through. The portal path and
EXPLAIN EXECUTE cover the most common
prepared-statement-with-partitions workloads; the remaining sites can
be converted incrementally.
This is now starting to feel closer to what Tom suggested back in
January 2023 [1], where he proposed getting rid of
AcquireExecutorLocks() inside GetCachedPlan() entirely and pushing
lock acquisition out to callers. He noted that "we'd be pushing the
responsibility for looping back and re-planning out to fairly
high-level calling code" and that "we'd definitely be changing some
fundamental APIs." That is the direction I came around to over the
last couple of weeks while wrestling with CachedPlanPrepData. The
reverted approach also tried to follow Tom's direction but moved
locking into ExecutorStart(), which forced it to handle plan
invalidation from inside the executor by mutating the CachedPlan
in-place. This version moves locking out to the callers instead, so
the executor and plan cache never reach into each other.
The series is now four patches:
0001: Move execution lock acquisition out of GetCachedPlan(). Adds
AcquireExecutorLocks() as a caller-facing function with validity check
and retry. Adds PortalLockCachedPlan() in pquery.c to centralize the
portal retry logic. All callers are converted. No behavioral change.
0002: Refactor executor's initial partition pruning setup. Cleanup
only, no behavioral change.
0003: Introduce ExecutorPrep() and refactor executor startup. Factors
range table init, permission checks, and initial pruning out of
InitPlan(). Scaffolding for 0004; all callers still go through the
normal ExecutorStart() path.
0004: Use pruning-aware locking for single-statement cached plans.
Adds ExecutorPrepAndLock() which locks unprunable relations, runs
ExecutorPrep() to determine surviving partitions, then locks only
those. Extends PortalLockCachedPlan() with a pruning-aware path for
eligible plans. Multi-statement CachedPlans (from rule rewriting)
always use conservative locking. In principle, this could be relaxed
if the planner can prove that no pruning expression reads state
modified by an earlier statement, but that is left for a future patch.
Includes regression tests.
In case it's not clear, I'm not targeting v19 at this point. I'd like
to get this into v20 CF1 and would welcome review from anyone
interested.
--
Thanks,
Amit Langote
[1] https://www.postgresql.org/message-id/4191508.1674157166%40sss.pgh.pa.us
Attachments:
[application/octet-stream] v11-0004-Use-pruning-aware-locking-for-single-statement-c.patch (40.3K, 2-v11-0004-Use-pruning-aware-locking-for-single-statement-c.patch)
download | inline diff:
From f586635ab49f3027546a7bda4c4f6017b946f333 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 20:43:14 +0900
Subject: [PATCH v11 4/4] Use pruning-aware locking for single-statement cached
plans
For single-statement reused generic plans, perform initial partition
pruning before acquiring execution locks, then lock only the
surviving partitions.
Add ExecutorPrepAndLock() which encapsulates the pruning-aware lock
sequence: lock unprunable relations, call ExecutorPrep() to run
initial pruning, then lock survivors. Plan validity is checked
after each step; ExecutorPrepCleanup() handles the case where the
plan is invalidated between prep and execution.
Extend PortalLockCachedPlan() to use the pruning-aware path for
eligible plans (single-statement reused generic, non-utility).
All other cases continue using the conservative lock-all path
from the previous commit.
Track firstResultRels in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving ExecInitModifyTable()
assumptions about the first result relation being available.
Multi-statement CachedPlans (from rule rewriting) always use
conservative locking, since PortalRunMulti() executes statements
sequentially with CCI between them and later statements' pruning
expressions may depend on earlier ones' effects. In principle,
this could be relaxed if the planner can prove that no pruning
expression reads state modified by an earlier statement, but that
is left for a future patch.
Regression tests are included to verify:
- Only surviving partitions are locked when pruning is enabled, and
all partitions are locked when it is disabled (pg_locks inspection).
- Multiple ModifyTable nodes (via writable CTEs) handle the case where
all target partitions are pruned, exercising firstResultRels.
- Plan invalidation during pruning-aware lock setup (DDL triggered by
a pruning expression) discards the prep state and replans cleanly.
- Multi-statement CachedPlans (from rule rewriting) fall back to
locking all partitions, avoiding stale pruning results.
Note for extension authors: code that accesses partition relations
through EState must check that the RT index is a member of
es_unpruned_relids before opening the relation. Previously this
was an optimization; it is now a correctness requirement, because
pruned partitions may not be locked.
---
src/backend/commands/explain.c | 45 +++--
src/backend/commands/prepare.c | 30 ++-
src/backend/executor/execMain.c | 142 ++++++++++++++
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 18 ++
src/backend/tcop/pquery.c | 54 ++++-
src/backend/utils/cache/plancache.c | 16 ++
src/include/commands/explain.h | 3 +-
src/include/executor/executor.h | 4 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 2 +
src/test/regress/expected/partition_prune.out | 184 ++++++++++++++++++
src/test/regress/expected/plancache.out | 63 ++++++
src/test/regress/sql/partition_prune.sql | 116 +++++++++++
src/test/regress/sql/plancache.sql | 52 +++++
17 files changed, 720 insertions(+), 28 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index e4b70166b0e..60cd912ace1 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -374,7 +374,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ NULL);
}
/*
@@ -498,7 +499,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters)
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -527,13 +529,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
/*
* We discard the output if we have no use for it. If we're explaining
* CREATE TABLE AS, we'd better use the appropriate tuple receiver, while
@@ -549,10 +544,34 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
else
dest = None_Receiver;
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ /*
+ * Create a QueryDesc for the query, or use the one provided by the
+ * caller. When reusing a prep QueryDesc, its snapshot was set at
+ * creation time; we push it as active for ExecutorStart and override the
+ * destination and instrument options, which were not known when the
+ * caller created it.
+ */
+ if (prep_qd)
+ {
+ PushActiveSnapshot(GetActiveSnapshot());
+ queryDesc = prep_qd;
+ Assert(queryDesc->dest == None_Receiver);
+ queryDesc->dest = dest;
+ queryDesc->instrument_options = instrument_option;
+ }
+ else
+ {
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+ queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv,
+ instrument_option);
+ }
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 03d7a98fc58..3bbbc052149 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -588,6 +588,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ QueryDesc *prep_qd = NULL;
if (es->memory)
{
@@ -640,8 +641,31 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
pstate->p_queryEnv);
plan_list = cplan->stmt_list;
- if (AcquireExecutorLocks(cplan))
+ if (!CachedPlanCanPrep(cplan, entry->plansource))
+ {
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ continue;
+ }
+
+ prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, plan_list),
+ query_string,
+ GetActiveSnapshot(),
+ InvalidSnapshot,
+ None_Receiver, /* ExplainOnePlan will fix */
+ paramLI,
+ pstate->p_queryEnv,
+ 0 /* ExplainOnePlan will fix */ );
+ if (ExecutorPrepAndLock(prep_qd,
+ CurrentResourceOwner,
+ es->generic ? EXEC_FLAG_EXPLAIN_GENERIC : 0,
+ &cplan->is_valid))
break;
+
+ /* Try again. */
+ ExecutorPrepCleanup(prep_qd);
+ FreeQueryDesc(prep_qd);
ReleaseCachedPlan(cplan, CurrentResourceOwner);
}
@@ -664,6 +688,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
+ Assert(prep_qd == NULL || list_length(plan_list) == 1);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
@@ -671,7 +696,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ prep_qd);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, pstate, paramLI);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 735c80e08a9..7333c0f66d5 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -324,6 +324,124 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * LockRangeTableRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRangeTableRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not fail
+ * if it's been dropped entirely --- we'll just transiently acquire a
+ * non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksPrepared
+ *
+ * Acquire or release execution locks using pruning results already computed
+ * by ExecutorPrep() and stored in queryDesc->estate.
+ *
+ * This is intended for single-statement reused generic-plan paths that
+ * choose pruning-aware locking instead of the conservative
+ * AcquireExecutorLocks() path.
+ */
+static void
+AcquireExecutorLocksPrepared(QueryDesc *queryDesc, bool acquire)
+{
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ EState *estate = queryDesc->estate;
+ Bitmapset *lock_relids;
+ ListCell *lc;
+
+ Assert(queryDesc != NULL);
+ Assert(estate != NULL);
+ Assert(plannedstmt != NULL);
+ Assert(plannedstmt->commandType != CMD_UTILITY);
+
+ lock_relids = bms_difference(estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * Keep the first result relation of each ModifyTable locked even if
+ * pruning removed all target partitions. ExecInitModifyTable() relies on
+ * one such relation remaining available.
+ */
+ foreach(lc, plannedstmt->firstResultRels)
+ {
+ Index rti = lfirst_int(lc);
+
+ lock_relids = bms_add_member(lock_relids, rti);
+ }
+
+ LockRangeTableRelids(plannedstmt->rtable, lock_relids, acquire);
+
+ bms_free(lock_relids);
+
+}
+
+/*
+ * ExecutorPrepAndLock
+ * Perform pruning-aware locking for a single PlannedStmt.
+ *
+ * Locks unprunable relations first, then runs ExecutorPrep() to
+ * determine which partitions survive initial pruning, then locks
+ * only those survivors. Checks *is_valid after each locking step
+ * to detect plan invalidation (e.g., from concurrent DDL or DDL
+ * triggered by a pruning expression).
+ *
+ * Returns true if the plan is still valid and all needed locks are
+ * held. Returns false if the plan was invalidated at any point, in
+ * which case all acquired locks have been released and the caller
+ * should discard the QueryDesc and retry with a fresh plan.
+ */
+bool
+ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid)
+{
+ PlannedStmt *pstmt = queryDesc->plannedstmt;
+
+ /* Lock unprunable rels before pruning can access them. */
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, true);
+ if (!*is_valid)
+ {
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ /* Run pruning and lock survivors. */
+ ExecutorPrep(queryDesc, owner, eflags);
+ AcquireExecutorLocksPrepared(queryDesc, true);
+ if (!*is_valid)
+ {
+ AcquireExecutorLocksPrepared(queryDesc, false);
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ return true;
+}
+
/*
* ExecutorPrep
*
@@ -382,6 +500,30 @@ ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
CurrentResourceOwner = oldowner;
}
+/*
+ * ExecutorPrepCleanup
+ * Clean up an EState that was created by ExecutorPrep() but never
+ * passed to ExecutorStart(). This happens when the plan is
+ * invalidated between prep and execution, and the caller must
+ * discard the prepped state before retrying with a fresh plan.
+ *
+ * Unlike ExecutorEnd(), this does not expect a fully initialized
+ * plan state tree -- only the range table relations and the
+ * EState itself need to be freed.
+ */
+void
+ExecutorPrepCleanup(QueryDesc *queryDesc)
+{
+ EState *estate = queryDesc->estate;
+
+ if (estate == NULL)
+ return;
+
+ ExecCloseRangeTableRelations(estate);
+ FreeExecutorState(estate);
+ queryDesc->estate = NULL;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index dfd7b33aa9b..8bc5c36e09d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -5112,8 +5112,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -5127,6 +5127,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 4ec76ce31a9..ace1cbacc91 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -657,6 +657,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ff0e875f2a2..6ee51f06920 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,24 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of initially
+ * prunable relations. We use bms_next_member() to get the
+ * lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition expansion
+ * preserves RT index order. ExecInitModifyTable() asserts that the
+ * recorded index matches what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 1b22515d56e..af732821139 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,7 +59,9 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
-static bool PortalLockCachedPlan(Portal portal);
+static bool PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **queryDesc_p);
/*
@@ -492,9 +494,14 @@ restart:
* the destination to DestNone.
*
* If the portal is backed by a cached plan, acquire execution
- * locks via PortalLockCachedPlan(). If the plan is
- * invalidated during locking, it replans and may change the
- * portal strategy, requiring us to restart PortalStart().
+ * locks via PortalLockCachedPlan(). For eligible plans
+ * (single-statement reused generic), this performs
+ * pruning-aware locking: it runs ExecutorPrep() on the
+ * QueryDesc to determine which partitions survive initial
+ * pruning, then locks only those. If the plan is invalidated
+ * during this process, it replans and rebuilds the QueryDesc.
+ * If replanning changes the portal strategy, we must restart
+ * PortalStart() to redispatch.
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
portal->sourceText,
@@ -506,7 +513,7 @@ restart:
0);
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, true, params, &queryDesc))
{
PopActiveSnapshot();
goto restart;
@@ -552,7 +559,7 @@ restart:
case PORTAL_ONE_MOD_WITH:
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -608,7 +615,7 @@ restart:
*/
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -1825,15 +1832,32 @@ EnsurePortalSnapshotExists(void)
* Acquire execution locks for a cached-plan-backed portal,
* retrying with a fresh plan if the current one is invalidated.
*
+ * If do_prep is true and the plan is eligible (single-statement reused
+ * generic plan), performs pruning-aware locking via ExecutorPrep() and
+ * populates portal->queryDesc with the prepped QueryDesc. Otherwise
+ * falls back to locking all relations in the plan.
+ *
* Returns true if replanning changed portal->strategy, meaning the
- * caller must redispatch. Returns false once locks are held.
+ * caller must redispatch. Returns false once locks are held and the
+ * plan is valid for execution.
*/
static bool
-PortalLockCachedPlan(Portal portal)
+PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **prep_qd)
{
PortalStrategy start_strategy = portal->strategy;
- if (AcquireExecutorLocks(portal->cplan))
+ if (do_prep && CachedPlanCanPrep(portal->cplan, portal->plansource))
+ {
+ Assert(prep_qd);
+ if (ExecutorPrepAndLock(*prep_qd, portal->resowner, 0,
+ &portal->cplan->is_valid))
+ return false;
+ ExecutorPrepCleanup(*prep_qd);
+ FreeQueryDesc(*prep_qd);
+ }
+ else if (AcquireExecutorLocks(portal->cplan))
return false;
/* Replan. Locks will be taken freshly. */
@@ -1849,5 +1873,15 @@ PortalLockCachedPlan(Portal portal)
if (portal->strategy != start_strategy)
return true;
+ if (prep_qd)
+ {
+ Assert(list_length(portal->stmts) == 1);
+ *prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->sourceText,
+ GetActiveSnapshot(), InvalidSnapshot,
+ None_Receiver, params,
+ portal->queryEnv, 0);
+ }
+
return false;
}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index f7fe366859c..fca2f84081e 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1977,6 +1977,22 @@ AcquireExecutorLocks(CachedPlan *cplan)
return true;
}
+/*
+ * CachedPlanCanPrep
+ * Check whether a cached plan is eligible for pruning-aware locking
+ * via ExecutorPrepAndLock().
+ *
+ * Only single-statement reused generic plans with a non-utility command
+ * qualify.
+ */
+bool
+CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource)
+{
+ return (cplan == plansource->gplan &&
+ list_length(cplan->stmt_list) == 1 &&
+ linitial_node(PlannedStmt, cplan->stmt_list)->commandType != CMD_UTILITY);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 472e141bba3..3a03355e6b6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -69,7 +69,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters);
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 491c4886506..fef5aadcdfa 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -21,6 +21,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -235,6 +236,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid);
+extern void ExecutorPrepCleanup(QueryDesc *queryDesc);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 693b879f76d..8753e05152b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 14a1dfed2b9..7f6f7cda781 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -120,6 +120,16 @@ typedef struct PlannedStmt
/* RT indexes of relations targeted by INSERT/UPDATE/DELETE/MERGE */
Bitmapset *resultRelationRelids;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index e0fc403e717..2941d3a301b 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -254,4 +254,6 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+extern bool CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource);
+
#endif /* PLANCACHE_H */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index deacdd75807..61781389d2f 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4824,3 +4824,187 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+(1 row)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_2
+ Update on prunelock_p1 prunelock_p_3
+ Update on prunelock_p2 prunelock_p_4
+ Update on prunelock_p3 prunelock_p_5
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_3
+ -> Seq Scan on prunelock_p2 prunelock_p_4
+ -> Seq Scan on prunelock_p3 prunelock_p_5
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_6
+ -> Append
+ Subplans Removed: 3
+ -> Append
+ Subplans Removed: 3
+(16 rows)
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+ a | b
+---+---
+ 1 | 0
+ 2 | 1
+(2 rows)
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index 4e59188196c..3043dbfac2d 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -398,3 +398,66 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index d93c0c03bab..692415a8d9f 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1447,3 +1447,119 @@ select min(a) over (partition by a order by a) from part_abc where a >= stable_o
drop view part_abc_view;
drop table part_abc;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index 4b2f11dcc64..6a8b8787de6 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -223,3 +223,55 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+
+reset plan_cache_mode;
--
2.47.3
[application/octet-stream] v11-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch (8.9K, 3-v11-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch)
download | inline diff:
From 1b9f7861d7162f5b20f69ea9db5dda13f64c202e Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 26 Mar 2026 16:08:46 +0900
Subject: [PATCH v11 3/4] Introduce ExecutorPrep and refactor executor startup
Move permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper.
ExecutorStart() invokes ExecutorPrep() when QueryDesc->estate is
NULL, keeping current behavior unchanged. If QueryDesc->estate is
already set, ExecutorStart() reuses it.
This is preparatory refactoring only. No caller outside the
executor supplies a prebuilt EState in this commit.
In assert builds, verify that the expected relation locks are held
when entering ExecutorStart().
---
src/backend/executor/README | 10 ++-
src/backend/executor/execMain.c | 152 ++++++++++++++++++++++++++------
src/include/executor/execdesc.h | 2 +-
3 files changed, 132 insertions(+), 32 deletions(-)
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..890bc3d9333 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,17 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart, or implicitly from ExecutorStart
+ if not done earlier. Creates the EState in QueryDesc, performs
+ range table initialization, permission checks, and initial
+ partition pruning.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if QueryDesc.estate is NULL)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 45e00c6af85..735c80e08a9 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -57,6 +57,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -76,6 +77,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
+static void ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags);
static void InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
@@ -147,7 +149,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -173,9 +174,67 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When no
+ * prep EState was provided, AcquireExecutorLocks() should have locked
+ * every relation in the plan. When one was provided, pruning-aware
+ * locking should have locked at least the unpruned relations. Both
+ * checks are skipped in parallel workers, which acquire relation locks
+ * lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ ExecutorPrep(queryDesc, CurrentResourceOwner, eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking should
+ * have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -265,6 +324,64 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep
+ *
+ * Build the initial executor state for queryDesc before ExecutorStart().
+ *
+ * This creates the EState and performs the subset of executor startup that
+ * does not require plan-tree initialization, allowing that work to be reused
+ * by callers that need executor state before ExecutorStart():
+ *
+ * - initialize the range table
+ * - perform permission checks
+ * - perform initial partition pruning
+ *
+ * On success, queryDesc->estate is set and can later be reused by
+ * ExecutorStart() instead of rebuilding the same state.
+ *
+ * Caller must ensure that queryDesc->snapshot is active.
+ */
+static void
+ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+ PlannedStmt *pstmt;
+
+ Assert(queryDesc != NULL);
+
+ if (queryDesc->operation == CMD_UTILITY)
+ return;
+
+ Assert(ActiveSnapshotSet());
+ Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ Assert(queryDesc->estate == NULL);
+
+ pstmt = queryDesc->plannedstmt;
+
+ estate = CreateExecutorState();
+ queryDesc->estate = estate;
+
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = queryDesc->params;
+ estate->es_queryEnv = queryDesc->queryEnv;
+ estate->es_top_eflags = eflags;
+
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -840,37 +957,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index d3a57242844..27697760bb9 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -43,7 +43,7 @@ typedef struct QueryDesc
QueryEnvironment *queryEnv; /* query environment passed in */
int instrument_options; /* OR of InstrumentOption flags */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
--
2.47.3
[application/octet-stream] v11-0001-Move-execution-lock-acquisition-out-of-GetCached.patch (16.4K, 4-v11-0001-Move-execution-lock-acquisition-out-of-GetCached.patch)
download | inline diff:
From 8dc44320c7d4b20f50200d7b21c98e4058b8d6d7 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 18:38:34 +0900
Subject: [PATCH v11 1/4] Move execution lock acquisition out of
GetCachedPlan()
GetCachedPlan() previously acquired execution locks on all plan
relations as part of cached plan validation. Move this
responsibility to callers, making GetCachedPlan() return a valid
plan without holding execution locks.
Add AcquireExecutorLocks() as the caller-facing function: it locks
all relations in the plan, checks that the plan is still valid
afterward, and returns false if it was invalidated so the caller
can retry with a fresh plan.
For portal-backed callers, add PortalLockCachedPlan() in pquery.c
which wraps the lock-check-retry loop and handles the case where
replanning changes the portal strategy. Store the CachedPlanSource
pointer in PortalData so retry can call GetCachedPlan() without
the caller threading it through.
Adjust all non-portal GetCachedPlan() callers (SPI, EXPLAIN
EXECUTE, SQL functions) to call AcquireExecutorLocks() explicitly
after fetching the plan.
No behavioral change. This separates plan retrieval from execution
setup, allowing a later commit to substitute pruning-aware locking
for eligible plans.
---
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 14 +++++-
src/backend/executor/functions.c | 14 ++++--
src/backend/executor/spi.c | 22 ++++++++--
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 68 ++++++++++++++++++++++++++++-
src/backend/utils/cache/plancache.c | 44 ++++++++++++++-----
src/backend/utils/mmgr/portalmem.c | 7 +++
src/include/utils/plancache.h | 1 +
src/include/utils/portal.h | 3 ++
10 files changed, 155 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..cf5deec4943 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..03d7a98fc58 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -207,6 +207,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ entry->plansource,
cplan);
/*
@@ -632,8 +633,17 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
- cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ for (;;)
+ {
+ cplan = GetCachedPlan(entry->plansource, paramLI,
+ CurrentResourceOwner,
+ pstate->p_queryEnv);
+ plan_list = cplan->stmt_list;
+
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ }
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..2afb814a435 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -654,6 +654,7 @@ static bool
init_execution_state(SQLFunctionCachePtr fcache)
{
CachedPlanSource *plansource;
+ CachedPlan *cplan;
execution_state *preves = NULL;
execution_state *lasttages = NULL;
int nstmts;
@@ -696,10 +697,15 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
- fcache->cplan = GetCachedPlan(plansource,
- fcache->paramLI,
- fcache->cowner,
- NULL);
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, fcache->paramLI,
+ fcache->cowner, NULL);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, fcache->cowner);
+ }
+ fcache->cplan = cplan;
/*
* If necessary, make esarray[] bigger to hold the needed state.
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..268cd10bde8 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1686,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ plansource,
cplan);
/*
@@ -2106,6 +2107,16 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
_SPI_current->queryEnv);
Assert(cplan == plansource->gplan);
+ if (!AcquireExecutorLocks(cplan))
+ {
+ /* Plan invalidated during locking; get a fresh one. */
+ ReleaseCachedPlan(cplan,
+ plan->saved ? CurrentResourceOwner : NULL);
+ cplan = GetCachedPlan(plansource, NULL,
+ plan->saved ? CurrentResourceOwner : NULL,
+ _SPI_current->queryEnv);
+ }
+
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
@@ -2574,9 +2585,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
- cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, options->params,
+ plan_owner, _SPI_current->queryEnv);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, plan_owner);
+ }
stmt_list = cplan->stmt_list;
/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 10be60011ad..aaebefcdf7a 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1231,6 +1231,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2030,6 +2031,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ psrc,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index d8fc75d0bb9..1b22515d56e 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,6 +59,7 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
+static bool PortalLockCachedPlan(Portal portal);
/*
@@ -462,6 +463,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
portal->strategy = ChoosePortalStrategy(portal->stmts);
+restart:
+
/*
* Fire her up according to the strategy
*/
@@ -487,6 +490,11 @@ PortalStart(Portal portal, ParamListInfo params,
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
+ *
+ * If the portal is backed by a cached plan, acquire execution
+ * locks via PortalLockCachedPlan(). If the plan is
+ * invalidated during locking, it replans and may change the
+ * portal strategy, requiring us to restart PortalStart().
*/
queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
portal->sourceText,
@@ -496,6 +504,14 @@ PortalStart(Portal portal, ParamListInfo params,
params,
portal->queryEnv,
0);
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ {
+ PopActiveSnapshot();
+ goto restart;
+ }
+ }
/*
* If it's a scrollable cursor, executor needs to support
@@ -534,6 +550,11 @@ PortalStart(Portal portal, ParamListInfo params,
case PORTAL_ONE_RETURNING:
case PORTAL_ONE_MOD_WITH:
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
/*
* We don't start the executor until we are told to run the
@@ -577,7 +598,20 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+
+ /*
+ * GetCachedPlan() no longer acquires execution locks, so we
+ * must do it here. Multi-statement plans always use
+ * conservative locking (all partitions locked); pruning-aware
+ * locking is not feasible because PortalRunMulti() executes
+ * statements sequentially with CCI between them.
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -1785,3 +1819,35 @@ EnsurePortalSnapshotExists(void)
/* PushActiveSnapshotWithLevel might have copied the snapshot */
portal->portalSnapshot = GetActiveSnapshot();
}
+
+/*
+ * PortalLockCachedPlan
+ * Acquire execution locks for a cached-plan-backed portal,
+ * retrying with a fresh plan if the current one is invalidated.
+ *
+ * Returns true if replanning changed portal->strategy, meaning the
+ * caller must redispatch. Returns false once locks are held.
+ */
+static bool
+PortalLockCachedPlan(Portal portal)
+{
+ PortalStrategy start_strategy = portal->strategy;
+
+ if (AcquireExecutorLocks(portal->cplan))
+ return false;
+
+ /* Replan. Locks will be taken freshly. */
+ ReleaseCachedPlan(portal->cplan, portal->resowner);
+ portal->cplan = NULL;
+ portal->stmts = NIL;
+ portal->cplan = GetCachedPlan(portal->plansource,
+ portal->portalParams,
+ portal->resowner,
+ portal->queryEnv);
+ portal->stmts = portal->cplan->stmt_list;
+ portal->strategy = ChoosePortalStrategy(portal->stmts);
+ if (portal->strategy != start_strategy)
+ return true;
+
+ return false;
+}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 698e7c1aa22..f7fe366859c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,7 +100,7 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksInt(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -945,8 +945,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, the generic plan may be reused as a valid cached
+ * plan. Any execution-time setup, including lock acquisition, is the
+ * caller's responsibility.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -983,8 +984,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
-
/*
* If plan was transient, check to see if TransactionXmin has
* advanced, and if so invalidate it.
@@ -1003,9 +1002,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
/* Successfully revalidated and locked the query. */
return true;
}
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
}
/*
@@ -1282,8 +1278,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid but no execution locks are held.
+ * The caller must call AcquireExecutorLocks() before executing.
+ * For freshly built plans (custom or new generic), the planner
+ * already holds the needed locks, so AcquireExecutorLocks() is
+ * redundant but harmless.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1906,9 +1905,11 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksInt(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1956,27 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * AcquireExecutorLocks
+ * Acquire execution locks on all relations in a cached plan.
+ *
+ * Returns true if the plan is still valid after locking. Returns
+ * false if the plan was invalidated while locks were being acquired,
+ * in which case the locks have been released and the caller should
+ * discard this plan and retry with a fresh one from GetCachedPlan().
+ */
+bool
+AcquireExecutorLocks(CachedPlan *cplan)
+{
+ AcquireExecutorLocksInt(cplan->stmt_list, true);
+ if (!cplan->is_valid)
+ {
+ AcquireExecutorLocksInt(cplan->stmt_list, false);
+ return false;
+ }
+ return true;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 493f9b0ee19..613f3be30b3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -272,6 +272,10 @@ CreateNewPortal(void)
* the passed plan trees have adequate lifetime. Typically this is done by
* copying them into the portal's context.
*
+ * If plansource is provided, it is the CachedPlanSource that produced
+ * cplan. PortalLockCachedPlan() uses it to fetch a fresh plan if the
+ * current one is invalidated during execution lock acquisition.
+ *
* The caller is also responsible for ensuring that the passed prepStmtName
* (if not NULL) and sourceText have adequate lifetime.
*
@@ -286,6 +290,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -299,6 +304,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->plansource = plansource;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
@@ -517,6 +523,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* drop cached plan reference, if any */
PortalReleaseCachedPlan(portal);
+ portal->plansource = NULL;
/*
* If portal has a snapshot protecting its data, release that. This needs
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 7a4a85c8038..e0fc403e717 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -241,6 +241,7 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern bool AcquireExecutorLocks(CachedPlan *cplan);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..3af535362cd 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,8 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ CachedPlanSource *plansource; /* CachedPlanSource, for replanning on
+ * invalidation */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +242,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v11-0002-Refactor-executor-s-initial-partition-pruning-se.patch (7.3K, 5-v11-0002-Refactor-executor-s-initial-partition-pruning-se.patch)
download | inline diff:
From ddc05ba324ab0347b2219ead1740a14617029f30 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:38 +0900
Subject: [PATCH v11 2/4] Refactor executor's initial partition pruning setup
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called before PARAM_EXEC parameters are
set up. A later commit needs this when running pruning state setup
outside of InitPlan().
No behavioral change.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++++---------
1 file changed, 48 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..2a3af006f77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -185,8 +185,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1978,7 +1977,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
* and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1996,29 +1995,31 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ Assert(estate->es_part_prune_results == NULL);
foreach(lc, estate->es_part_prune_infos)
{
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
/*
* Perform initial pruning steps, if any, and save the result
- * bitmapset or NULL as described in the header comment.
+ * bitmapset or NULL as described in the header comment. RT indexes
+ * of surviving partitions would be added to validsubplan_rtis.
+ *
+ * Note that when do_initial_prune is false,
+ * CreatePartitionPruneState() would have already added the RT indexes
+ * of all leaf partitions to es_unpruned_relids directly.
*/
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2136,14 +2137,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2377,8 +2376,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2390,9 +2389,28 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
+ }
+ }
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
}
+#endif
}
j++;
@@ -2490,9 +2508,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2520,9 +2539,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
--
2.47.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-05-27 12:03 Thom Brown <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Thom Brown @ 2026-05-27 12:03 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Chao Li <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
On Sat, 4 Apr 2026 at 13:11, Amit Langote <[email protected]> wrote:
>
> Attached is a redesigned version. While working on the previous
> design, I grew increasingly uncomfortable with CachedPlanPrepData --
> it was smuggling executor state out of GetCachedPlan() through an
> out-parameter, which papered over the real problem: GetCachedPlan()
> was doing too much. The main change in this version is architectural:
> GetCachedPlan() no longer acquires execution locks. Callers now own
> that responsibility, which is natural because each call site iterates
> stmt_list differently and manages execution state in its own way --
> and it lets them choose between conservative lock-all and
> pruning-aware locking where appropriate.
>
> Non-portal call sites remain on the conservative path for now.
> _SPI_execute_plan requires care around snapshot setup, which happens
> after plan fetch rather than before. SQL functions have a different
> issue: init_execution_state() fetches the plan while postquel_start()
> handles execution, with execution_state containers in between, making
> it harder to thread a prepped QueryDesc through. The portal path and
> EXPLAIN EXECUTE cover the most common
> prepared-statement-with-partitions workloads; the remaining sites can
> be converted incrementally.
>
> This is now starting to feel closer to what Tom suggested back in
> January 2023 [1], where he proposed getting rid of
> AcquireExecutorLocks() inside GetCachedPlan() entirely and pushing
> lock acquisition out to callers. He noted that "we'd be pushing the
> responsibility for looping back and re-planning out to fairly
> high-level calling code" and that "we'd definitely be changing some
> fundamental APIs." That is the direction I came around to over the
> last couple of weeks while wrestling with CachedPlanPrepData. The
> reverted approach also tried to follow Tom's direction but moved
> locking into ExecutorStart(), which forced it to handle plan
> invalidation from inside the executor by mutating the CachedPlan
> in-place. This version moves locking out to the callers instead, so
> the executor and plan cache never reach into each other.
>
> The series is now four patches:
>
> 0001: Move execution lock acquisition out of GetCachedPlan(). Adds
> AcquireExecutorLocks() as a caller-facing function with validity check
> and retry. Adds PortalLockCachedPlan() in pquery.c to centralize the
> portal retry logic. All callers are converted. No behavioral change.
>
> 0002: Refactor executor's initial partition pruning setup. Cleanup
> only, no behavioral change.
>
> 0003: Introduce ExecutorPrep() and refactor executor startup. Factors
> range table init, permission checks, and initial pruning out of
> InitPlan(). Scaffolding for 0004; all callers still go through the
> normal ExecutorStart() path.
>
> 0004: Use pruning-aware locking for single-statement cached plans.
> Adds ExecutorPrepAndLock() which locks unprunable relations, runs
> ExecutorPrep() to determine surviving partitions, then locks only
> those. Extends PortalLockCachedPlan() with a pruning-aware path for
> eligible plans. Multi-statement CachedPlans (from rule rewriting)
> always use conservative locking. In principle, this could be relaxed
> if the planner can prove that no pruning expression reads state
> modified by an earlier statement, but that is left for a future patch.
> Includes regression tests.
>
> In case it's not clear, I'm not targeting v19 at this point. I'd like
> to get this into v20 CF1 and would welcome review from anyone
> interested.
After not having looked at this in close to 2 years, I thought I'd
give it another look. Not found any user-facing issues, and I'm liking
seeing so few locks in pg_locks. I can see that with pruning disabled,
the fallback works, pruning-aware locking is working via SPI through
plpgsql, running ALTER between executions and also invalidating
indexes force replans, and it's looking good.
But I also think there might be a bug in patch 0001, but I'd
appreciate checking my reasoning because I'm not fully confident I've
been diligent enough.
When PortalStart() opens a SELECT cursor that's backed by a cached
plan, it does roughly the following. It builds a queryDesc (an
executor-side struct), one of whose fields is a pointer into the plan
tree inside the portal's cached plan. Then it calls
PortalLockCachedPlan() to acquire the necessary locks, and finally
hands the queryDesc over to the executor.
My worry is about what happens if the cached plan turns out to be
stale, for instance because someone ran DDL on a referenced table. In
that case PortalLockCachedPlan() throws the old plan away (via
ReleaseCachedPlan) and fetches a freshly-built replacement, updtating
the portal's own pointers to match. But the queryDesc from earlier
isn't touched. Its plan pointer still references the old, now-released
plan. From what I can see, once that old plan's last reference is
dropped its memory can be freed, which would leave the executor
reading from freed memory in the next step.
The bit I'm least sure about is whether the old plan's memory really
does get reclaimed straight away when its refcount hits zero. If
something keeps it alive longer then this isn't a bug, or at least not
as bad as I'm making out. I had a look but couldn't convince myself
either way from the code alone. To actually hit this you'd need a
cursor on a cached plan, plus an invalidation arriving in the small
window between the portal being set up and the cursor being opened.
The race condition is brief, and I've not been able to hit it in
testing.
The thing that got me thinking this is real: patch 0004 modifies
PortalLockCachedPlan() so that whenever it replans, it also rebuilds
the queryDesc. That's pretty much the fix I'd expect for this, which
makes me suspect somebody hit it at some point. But 0004 only applies
that fix on the new pruning-aware code path, and it was mentioned in
the thread that 0001 to 0003 might land before 0004. If so, master
would carry the bug in the gap between the two.
I suspect a way to deal with it would be to move the CreateQueryDesc
call in the SELECT case to after PortalLockCachedPlan() returns, which
is what the other portal strategies already seem to do. Alternatively,
you could bring 0004's changes in this area into 0001 and have
PortalLockCachedPlan() always rebuild the queryDesc when it replans.
If I've got this wrong and there's some lifetime mechanism I missed
that keeps the old plan's memory alive, then it's a non-issue and I'm
misreading the code. If I have got it wrong, could you please add
comments to make what is going on clearer?
Regards
Thom
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-05-28 08:13 Amit Langote <[email protected]>
parent: Thom Brown <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2026-05-28 08:13 UTC (permalink / raw)
To: Thom Brown <[email protected]>; +Cc: Chao Li <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
Hi Thom,
On Wed, May 27, 2026 at 9:03 PM Thom Brown <[email protected]> wrote:
>
> On Sat, 4 Apr 2026 at 13:11, Amit Langote <[email protected]> wrote:
> >
> > Attached is a redesigned version. While working on the previous
> > design, I grew increasingly uncomfortable with CachedPlanPrepData --
> > it was smuggling executor state out of GetCachedPlan() through an
> > out-parameter, which papered over the real problem: GetCachedPlan()
> > was doing too much. The main change in this version is architectural:
> > GetCachedPlan() no longer acquires execution locks. Callers now own
> > that responsibility, which is natural because each call site iterates
> > stmt_list differently and manages execution state in its own way --
> > and it lets them choose between conservative lock-all and
> > pruning-aware locking where appropriate.
> >
> > Non-portal call sites remain on the conservative path for now.
> > _SPI_execute_plan requires care around snapshot setup, which happens
> > after plan fetch rather than before. SQL functions have a different
> > issue: init_execution_state() fetches the plan while postquel_start()
> > handles execution, with execution_state containers in between, making
> > it harder to thread a prepped QueryDesc through. The portal path and
> > EXPLAIN EXECUTE cover the most common
> > prepared-statement-with-partitions workloads; the remaining sites can
> > be converted incrementally.
> >
> > This is now starting to feel closer to what Tom suggested back in
> > January 2023 [1], where he proposed getting rid of
> > AcquireExecutorLocks() inside GetCachedPlan() entirely and pushing
> > lock acquisition out to callers. He noted that "we'd be pushing the
> > responsibility for looping back and re-planning out to fairly
> > high-level calling code" and that "we'd definitely be changing some
> > fundamental APIs." That is the direction I came around to over the
> > last couple of weeks while wrestling with CachedPlanPrepData. The
> > reverted approach also tried to follow Tom's direction but moved
> > locking into ExecutorStart(), which forced it to handle plan
> > invalidation from inside the executor by mutating the CachedPlan
> > in-place. This version moves locking out to the callers instead, so
> > the executor and plan cache never reach into each other.
> >
> > The series is now four patches:
> >
> > 0001: Move execution lock acquisition out of GetCachedPlan(). Adds
> > AcquireExecutorLocks() as a caller-facing function with validity check
> > and retry. Adds PortalLockCachedPlan() in pquery.c to centralize the
> > portal retry logic. All callers are converted. No behavioral change.
> >
> > 0002: Refactor executor's initial partition pruning setup. Cleanup
> > only, no behavioral change.
> >
> > 0003: Introduce ExecutorPrep() and refactor executor startup. Factors
> > range table init, permission checks, and initial pruning out of
> > InitPlan(). Scaffolding for 0004; all callers still go through the
> > normal ExecutorStart() path.
> >
> > 0004: Use pruning-aware locking for single-statement cached plans.
> > Adds ExecutorPrepAndLock() which locks unprunable relations, runs
> > ExecutorPrep() to determine surviving partitions, then locks only
> > those. Extends PortalLockCachedPlan() with a pruning-aware path for
> > eligible plans. Multi-statement CachedPlans (from rule rewriting)
> > always use conservative locking. In principle, this could be relaxed
> > if the planner can prove that no pruning expression reads state
> > modified by an earlier statement, but that is left for a future patch.
> > Includes regression tests.
> >
> > In case it's not clear, I'm not targeting v19 at this point. I'd like
> > to get this into v20 CF1 and would welcome review from anyone
> > interested.
>
> After not having looked at this in close to 2 years, I thought I'd
> give it another look.
Thanks for taking a look.
> Not found any user-facing issues, and I'm liking
> seeing so few locks in pg_locks. I can see that with pruning disabled,
> the fallback works, pruning-aware locking is working via SPI through
> plpgsql, running ALTER between executions and also invalidating
> indexes force replans, and it's looking good.
>
> But I also think there might be a bug in patch 0001, but I'd
> appreciate checking my reasoning because I'm not fully confident I've
> been diligent enough.
>
> When PortalStart() opens a SELECT cursor that's backed by a cached
> plan, it does roughly the following. It builds a queryDesc (an
> executor-side struct), one of whose fields is a pointer into the plan
> tree inside the portal's cached plan. Then it calls
> PortalLockCachedPlan() to acquire the necessary locks, and finally
> hands the queryDesc over to the executor.
>
> My worry is about what happens if the cached plan turns out to be
> stale, for instance because someone ran DDL on a referenced table. In
> that case PortalLockCachedPlan() throws the old plan away (via
> ReleaseCachedPlan) and fetches a freshly-built replacement, updtating
> the portal's own pointers to match. But the queryDesc from earlier
> isn't touched. Its plan pointer still references the old, now-released
> plan. From what I can see, once that old plan's last reference is
> dropped its memory can be freed, which would leave the executor
> reading from freed memory in the next step.
>
> The bit I'm least sure about is whether the old plan's memory really
> does get reclaimed straight away when its refcount hits zero. If
> something keeps it alive longer then this isn't a bug, or at least not
> as bad as I'm making out. I had a look but couldn't convince myself
> either way from the code alone. To actually hit this you'd need a
> cursor on a cached plan, plus an invalidation arriving in the small
> window between the portal being set up and the cursor being opened.
> The race condition is brief, and I've not been able to hit it in
> testing.
>
> The thing that got me thinking this is real: patch 0004 modifies
> PortalLockCachedPlan() so that whenever it replans, it also rebuilds
> the queryDesc. That's pretty much the fix I'd expect for this, which
> makes me suspect somebody hit it at some point. But 0004 only applies
> that fix on the new pruning-aware code path, and it was mentioned in
> the thread that 0001 to 0003 might land before 0004. If so, master
> would carry the bug in the gap between the two.
>
> I suspect a way to deal with it would be to move the CreateQueryDesc
> call in the SELECT case to after PortalLockCachedPlan() returns, which
> is what the other portal strategies already seem to do. Alternatively,
> you could bring 0004's changes in this area into 0001 and have
> PortalLockCachedPlan() always rebuild the queryDesc when it replans.
>
> If I've got this wrong and there's some lifetime mechanism I missed
> that keeps the old plan's memory alive, then it's a non-issue and I'm
> misreading the code. If I have got it wrong, could you please add
> comments to make what is going on clearer?
It's a real bug.
You're right that if PortalLockCachedPlan() replans, the QueryDesc
created before the call still points at the old PlannedStmt from the
released plan. And yes, 0004 happens to fix it by rebuilding the
QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are
broken on their own.
Attached is an updated set with the fix: CreateQueryDesc now runs
after PortalLockCachedPlan() returns, as you suggested. That said,
I'll probably focus first on settling the plancache refactoring that
spun off from this thread [1], and then start a new thread for the
pruning-aware locking work on top of it, incorporating parts of this
series.
--
Thanks, Amit Langote
[1] https://www.postgresql.org/message-id/CA%2BHiwqE1ntHy2h9zJ9v3MwAkoGAveSERcHWkDTTZnP0kxWqbKQ%40mail.g...
Attachments:
[application/octet-stream] v12-0001-Move-execution-lock-acquisition-out-of-GetCached.patch (16.2K, 2-v12-0001-Move-execution-lock-acquisition-out-of-GetCached.patch)
download | inline diff:
From a3214580f2ce1983a111af07ccb092ba03c812c8 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 18:38:34 +0900
Subject: [PATCH v12 1/4] Move execution lock acquisition out of
GetCachedPlan()
GetCachedPlan() previously acquired execution locks on all plan
relations as part of cached plan validation. Move this
responsibility to callers, making GetCachedPlan() return a valid
plan without holding execution locks.
Add AcquireExecutorLocks() as the caller-facing function: it locks
all relations in the plan, checks that the plan is still valid
afterward, and returns false if it was invalidated so the caller
can retry with a fresh plan.
For portal-backed callers, add PortalLockCachedPlan() in pquery.c
which wraps the lock-check-retry loop and handles the case where
replanning changes the portal strategy. Store the CachedPlanSource
pointer in PortalData so retry can call GetCachedPlan() without
the caller threading it through.
Adjust all non-portal GetCachedPlan() callers (SPI, EXPLAIN
EXECUTE, SQL functions) to call AcquireExecutorLocks() explicitly
after fetching the plan.
No behavioral change. This separates plan retrieval from execution
setup, allowing a later commit to substitute pruning-aware locking
for eligible plans.
---
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 14 +++++-
src/backend/executor/functions.c | 14 ++++--
src/backend/executor/spi.c | 22 +++++++--
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 70 ++++++++++++++++++++++++++++-
src/backend/utils/cache/plancache.c | 44 +++++++++++++-----
src/backend/utils/mmgr/portalmem.c | 7 +++
src/include/utils/plancache.h | 1 +
src/include/utils/portal.h | 3 ++
10 files changed, 157 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..cf5deec4943 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..03d7a98fc58 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -207,6 +207,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ entry->plansource,
cplan);
/*
@@ -632,8 +633,17 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
- cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ for (;;)
+ {
+ cplan = GetCachedPlan(entry->plansource, paramLI,
+ CurrentResourceOwner,
+ pstate->p_queryEnv);
+ plan_list = cplan->stmt_list;
+
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ }
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..2afb814a435 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -654,6 +654,7 @@ static bool
init_execution_state(SQLFunctionCachePtr fcache)
{
CachedPlanSource *plansource;
+ CachedPlan *cplan;
execution_state *preves = NULL;
execution_state *lasttages = NULL;
int nstmts;
@@ -696,10 +697,15 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
- fcache->cplan = GetCachedPlan(plansource,
- fcache->paramLI,
- fcache->cowner,
- NULL);
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, fcache->paramLI,
+ fcache->cowner, NULL);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, fcache->cowner);
+ }
+ fcache->cplan = cplan;
/*
* If necessary, make esarray[] bigger to hold the needed state.
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..268cd10bde8 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1686,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ plansource,
cplan);
/*
@@ -2106,6 +2107,16 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
_SPI_current->queryEnv);
Assert(cplan == plansource->gplan);
+ if (!AcquireExecutorLocks(cplan))
+ {
+ /* Plan invalidated during locking; get a fresh one. */
+ ReleaseCachedPlan(cplan,
+ plan->saved ? CurrentResourceOwner : NULL);
+ cplan = GetCachedPlan(plansource, NULL,
+ plan->saved ? CurrentResourceOwner : NULL,
+ _SPI_current->queryEnv);
+ }
+
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
@@ -2574,9 +2585,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
- cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, options->params,
+ plan_owner, _SPI_current->queryEnv);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, plan_owner);
+ }
stmt_list = cplan->stmt_list;
/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index dbef734a93f..2929f158338 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1243,6 +1243,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2042,6 +2043,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ psrc,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index ee731000820..4699b53cab7 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,6 +59,7 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
+static bool PortalLockCachedPlan(Portal portal);
/*
@@ -463,6 +464,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
portal->strategy = ChoosePortalStrategy(portal->stmts);
+restart:
+
/*
* Fire her up according to the strategy
*/
@@ -485,6 +488,21 @@ PortalStart(Portal portal, ParamListInfo params,
* non-default nesting level for the snapshot.
*/
+ /*
+ * If the portal is backed by a cached plan, acquire execution
+ * locks via PortalLockCachedPlan(). If the plan is
+ * invalidated during locking, it replans and may change the
+ * portal strategy, requiring us to restart PortalStart().
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ {
+ PopActiveSnapshot();
+ goto restart;
+ }
+ }
+
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
@@ -535,6 +553,11 @@ PortalStart(Portal portal, ParamListInfo params,
case PORTAL_ONE_RETURNING:
case PORTAL_ONE_MOD_WITH:
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
/*
* We don't start the executor until we are told to run the
@@ -578,7 +601,20 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+
+ /*
+ * GetCachedPlan() no longer acquires execution locks, so we
+ * must do it here. Multi-statement plans always use
+ * conservative locking (all partitions locked); pruning-aware
+ * locking is not feasible because PortalRunMulti() executes
+ * statements sequentially with CCI between them.
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -1786,3 +1822,35 @@ EnsurePortalSnapshotExists(void)
/* PushActiveSnapshotWithLevel might have copied the snapshot */
portal->portalSnapshot = GetActiveSnapshot();
}
+
+/*
+ * PortalLockCachedPlan
+ * Acquire execution locks for a cached-plan-backed portal,
+ * retrying with a fresh plan if the current one is invalidated.
+ *
+ * Returns true if replanning changed portal->strategy, meaning the
+ * caller must redispatch. Returns false once locks are held.
+ */
+static bool
+PortalLockCachedPlan(Portal portal)
+{
+ PortalStrategy start_strategy = portal->strategy;
+
+ if (AcquireExecutorLocks(portal->cplan))
+ return false;
+
+ /* Replan. Locks will be taken freshly. */
+ ReleaseCachedPlan(portal->cplan, portal->resowner);
+ portal->cplan = NULL;
+ portal->stmts = NIL;
+ portal->cplan = GetCachedPlan(portal->plansource,
+ portal->portalParams,
+ portal->resowner,
+ portal->queryEnv);
+ portal->stmts = portal->cplan->stmt_list;
+ portal->strategy = ChoosePortalStrategy(portal->stmts);
+ if (portal->strategy != start_strategy)
+ return true;
+
+ return false;
+}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 698e7c1aa22..f7fe366859c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,7 +100,7 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksInt(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -945,8 +945,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, the generic plan may be reused as a valid cached
+ * plan. Any execution-time setup, including lock acquisition, is the
+ * caller's responsibility.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -983,8 +984,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
-
/*
* If plan was transient, check to see if TransactionXmin has
* advanced, and if so invalidate it.
@@ -1003,9 +1002,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
/* Successfully revalidated and locked the query. */
return true;
}
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
}
/*
@@ -1282,8 +1278,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid but no execution locks are held.
+ * The caller must call AcquireExecutorLocks() before executing.
+ * For freshly built plans (custom or new generic), the planner
+ * already holds the needed locks, so AcquireExecutorLocks() is
+ * redundant but harmless.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1906,9 +1905,11 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksInt(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1956,27 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * AcquireExecutorLocks
+ * Acquire execution locks on all relations in a cached plan.
+ *
+ * Returns true if the plan is still valid after locking. Returns
+ * false if the plan was invalidated while locks were being acquired,
+ * in which case the locks have been released and the caller should
+ * discard this plan and retry with a fresh one from GetCachedPlan().
+ */
+bool
+AcquireExecutorLocks(CachedPlan *cplan)
+{
+ AcquireExecutorLocksInt(cplan->stmt_list, true);
+ if (!cplan->is_valid)
+ {
+ AcquireExecutorLocksInt(cplan->stmt_list, false);
+ return false;
+ }
+ return true;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 493f9b0ee19..613f3be30b3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -272,6 +272,10 @@ CreateNewPortal(void)
* the passed plan trees have adequate lifetime. Typically this is done by
* copying them into the portal's context.
*
+ * If plansource is provided, it is the CachedPlanSource that produced
+ * cplan. PortalLockCachedPlan() uses it to fetch a fresh plan if the
+ * current one is invalidated during execution lock acquisition.
+ *
* The caller is also responsible for ensuring that the passed prepStmtName
* (if not NULL) and sourceText have adequate lifetime.
*
@@ -286,6 +290,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -299,6 +304,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->plansource = plansource;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
@@ -517,6 +523,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* drop cached plan reference, if any */
PortalReleaseCachedPlan(portal);
+ portal->plansource = NULL;
/*
* If portal has a snapshot protecting its data, release that. This needs
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 7a4a85c8038..e0fc403e717 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -241,6 +241,7 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern bool AcquireExecutorLocks(CachedPlan *cplan);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..3af535362cd 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,8 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ CachedPlanSource *plansource; /* CachedPlanSource, for replanning on
+ * invalidation */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +242,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v12-0002-Refactor-executor-s-initial-partition-pruning-se.patch (7.3K, 3-v12-0002-Refactor-executor-s-initial-partition-pruning-se.patch)
download | inline diff:
From 29e5ad113f6974a94fbcf984b43fa3ed86f57632 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:38 +0900
Subject: [PATCH v12 2/4] Refactor executor's initial partition pruning setup
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called before PARAM_EXEC parameters are
set up. A later commit needs this when running pruning state setup
outside of InitPlan().
No behavioral change.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++++---------
1 file changed, 48 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..2a3af006f77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -185,8 +185,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1978,7 +1977,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
* and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1996,29 +1995,31 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ Assert(estate->es_part_prune_results == NULL);
foreach(lc, estate->es_part_prune_infos)
{
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
/*
* Perform initial pruning steps, if any, and save the result
- * bitmapset or NULL as described in the header comment.
+ * bitmapset or NULL as described in the header comment. RT indexes
+ * of surviving partitions would be added to validsubplan_rtis.
+ *
+ * Note that when do_initial_prune is false,
+ * CreatePartitionPruneState() would have already added the RT indexes
+ * of all leaf partitions to es_unpruned_relids directly.
*/
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2136,14 +2137,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2377,8 +2376,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2390,9 +2389,28 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
+ }
+ }
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
}
+#endif
}
j++;
@@ -2490,9 +2508,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2520,9 +2539,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
--
2.47.3
[application/octet-stream] v12-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch (8.8K, 4-v12-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch)
download | inline diff:
From 05c92346e2bec4c8ec9a7cf45ec572c15d64481f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 26 Mar 2026 16:08:46 +0900
Subject: [PATCH v12 3/4] Introduce ExecutorPrep and refactor executor startup
Move permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper.
ExecutorStart() invokes ExecutorPrep() when QueryDesc->estate is
NULL, keeping current behavior unchanged. If QueryDesc->estate is
already set, ExecutorStart() reuses it.
This is preparatory refactoring only. No caller outside the
executor supplies a prebuilt EState in this commit.
In assert builds, verify that the expected relation locks are held
when entering ExecutorStart().
---
src/backend/executor/README | 10 ++-
src/backend/executor/execMain.c | 152 ++++++++++++++++++++++++++------
src/include/executor/execdesc.h | 2 +-
3 files changed, 132 insertions(+), 32 deletions(-)
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..890bc3d9333 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,17 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart, or implicitly from ExecutorStart
+ if not done earlier. Creates the EState in QueryDesc, performs
+ range table initialization, permission checks, and initial
+ partition pruning.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if QueryDesc.estate is NULL)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4b30f768680..2b9397b72f3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -57,6 +57,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -76,6 +77,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
+static void ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags);
static void InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
@@ -147,7 +149,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -173,9 +174,67 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When no
+ * prep EState was provided, AcquireExecutorLocks() should have locked
+ * every relation in the plan. When one was provided, pruning-aware
+ * locking should have locked at least the unpruned relations. Both
+ * checks are skipped in parallel workers, which acquire relation locks
+ * lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ ExecutorPrep(queryDesc, CurrentResourceOwner, eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking should
+ * have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -274,6 +333,64 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep
+ *
+ * Build the initial executor state for queryDesc before ExecutorStart().
+ *
+ * This creates the EState and performs the subset of executor startup that
+ * does not require plan-tree initialization, allowing that work to be reused
+ * by callers that need executor state before ExecutorStart():
+ *
+ * - initialize the range table
+ * - perform permission checks
+ * - perform initial partition pruning
+ *
+ * On success, queryDesc->estate is set and can later be reused by
+ * ExecutorStart() instead of rebuilding the same state.
+ *
+ * Caller must ensure that queryDesc->snapshot is active.
+ */
+static void
+ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+ PlannedStmt *pstmt;
+
+ Assert(queryDesc != NULL);
+
+ if (queryDesc->operation == CMD_UTILITY)
+ return;
+
+ Assert(ActiveSnapshotSet());
+ Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ Assert(queryDesc->estate == NULL);
+
+ pstmt = queryDesc->plannedstmt;
+
+ estate = CreateExecutorState();
+ queryDesc->estate = estate;
+
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = queryDesc->params;
+ estate->es_queryEnv = queryDesc->queryEnv;
+ estate->es_top_eflags = eflags;
+
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -849,37 +966,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 37c2576e4bc..aea5ec8ea02 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -45,7 +45,7 @@ typedef struct QueryDesc
int query_instr_options; /* OR of InstrumentOption flags for
* query_instr */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
--
2.47.3
[application/octet-stream] v12-0004-Use-pruning-aware-locking-for-single-statement-c.patch (40.7K, 5-v12-0004-Use-pruning-aware-locking-for-single-statement-c.patch)
download | inline diff:
From c68d5de848572defbb58625d915f3323245294d4 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 20:43:14 +0900
Subject: [PATCH v12 4/4] Use pruning-aware locking for single-statement cached
plans
For single-statement reused generic plans, perform initial partition
pruning before acquiring execution locks, then lock only the
surviving partitions.
Add ExecutorPrepAndLock() which encapsulates the pruning-aware lock
sequence: lock unprunable relations, call ExecutorPrep() to run
initial pruning, then lock survivors. Plan validity is checked
after each step; ExecutorPrepCleanup() handles the case where the
plan is invalidated between prep and execution.
Extend PortalLockCachedPlan() to use the pruning-aware path for
eligible plans (single-statement reused generic, non-utility).
All other cases continue using the conservative lock-all path
from the previous commit.
Track firstResultRels in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving ExecInitModifyTable()
assumptions about the first result relation being available.
Multi-statement CachedPlans (from rule rewriting) always use
conservative locking, since PortalRunMulti() executes statements
sequentially with CCI between them and later statements' pruning
expressions may depend on earlier ones' effects. In principle,
this could be relaxed if the planner can prove that no pruning
expression reads state modified by an earlier statement, but that
is left for a future patch.
Regression tests are included to verify:
- Only surviving partitions are locked when pruning is enabled, and
all partitions are locked when it is disabled (pg_locks inspection).
- Multiple ModifyTable nodes (via writable CTEs) handle the case where
all target partitions are pruned, exercising firstResultRels.
- Plan invalidation during pruning-aware lock setup (DDL triggered by
a pruning expression) discards the prep state and replans cleanly.
- Multi-statement CachedPlans (from rule rewriting) fall back to
locking all partitions, avoiding stale pruning results.
Note for extension authors: code that accesses partition relations
through EState must check that the RT index is a member of
es_unpruned_relids before opening the relation. Previously this
was an optimization; it is now a correctness requirement, because
pruned partitions may not be locked.
---
src/backend/commands/explain.c | 45 +++--
src/backend/commands/prepare.c | 30 ++-
src/backend/executor/execMain.c | 142 ++++++++++++++
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 18 ++
src/backend/tcop/pquery.c | 76 ++++++--
src/backend/utils/cache/plancache.c | 16 ++
src/include/commands/explain.h | 3 +-
src/include/executor/executor.h | 4 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 2 +
src/test/regress/expected/partition_prune.out | 184 ++++++++++++++++++
src/test/regress/expected/plancache.out | 63 ++++++
src/test/regress/sql/partition_prune.sql | 116 +++++++++++
src/test/regress/sql/plancache.sql | 52 +++++
17 files changed, 731 insertions(+), 39 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 112c17b0d64..c5254f0f920 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -377,7 +377,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ NULL);
}
/*
@@ -501,7 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters)
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -532,13 +534,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
/*
* We discard the output if we have no use for it. If we're explaining
* CREATE TABLE AS, we'd better use the appropriate tuple receiver, while
@@ -554,10 +549,34 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
else
dest = None_Receiver;
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ /*
+ * Create a QueryDesc for the query, or use the one provided by the
+ * caller. When reusing a prep QueryDesc, its snapshot was set at
+ * creation time; we push it as active for ExecutorStart and override the
+ * destination and instrument options, which were not known when the
+ * caller created it.
+ */
+ if (prep_qd)
+ {
+ PushActiveSnapshot(GetActiveSnapshot());
+ queryDesc = prep_qd;
+ Assert(queryDesc->dest == None_Receiver);
+ queryDesc->dest = dest;
+ queryDesc->instrument_options = instrument_option;
+ }
+ else
+ {
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+ queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv,
+ instrument_option);
+ }
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 03d7a98fc58..3bbbc052149 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -588,6 +588,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ QueryDesc *prep_qd = NULL;
if (es->memory)
{
@@ -640,8 +641,31 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
pstate->p_queryEnv);
plan_list = cplan->stmt_list;
- if (AcquireExecutorLocks(cplan))
+ if (!CachedPlanCanPrep(cplan, entry->plansource))
+ {
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ continue;
+ }
+
+ prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, plan_list),
+ query_string,
+ GetActiveSnapshot(),
+ InvalidSnapshot,
+ None_Receiver, /* ExplainOnePlan will fix */
+ paramLI,
+ pstate->p_queryEnv,
+ 0 /* ExplainOnePlan will fix */ );
+ if (ExecutorPrepAndLock(prep_qd,
+ CurrentResourceOwner,
+ es->generic ? EXEC_FLAG_EXPLAIN_GENERIC : 0,
+ &cplan->is_valid))
break;
+
+ /* Try again. */
+ ExecutorPrepCleanup(prep_qd);
+ FreeQueryDesc(prep_qd);
ReleaseCachedPlan(cplan, CurrentResourceOwner);
}
@@ -664,6 +688,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
+ Assert(prep_qd == NULL || list_length(plan_list) == 1);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
@@ -671,7 +696,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ prep_qd);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, pstate, paramLI);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2b9397b72f3..1e81377cfd8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -333,6 +333,124 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * LockRangeTableRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRangeTableRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not fail
+ * if it's been dropped entirely --- we'll just transiently acquire a
+ * non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksPrepared
+ *
+ * Acquire or release execution locks using pruning results already computed
+ * by ExecutorPrep() and stored in queryDesc->estate.
+ *
+ * This is intended for single-statement reused generic-plan paths that
+ * choose pruning-aware locking instead of the conservative
+ * AcquireExecutorLocks() path.
+ */
+static void
+AcquireExecutorLocksPrepared(QueryDesc *queryDesc, bool acquire)
+{
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ EState *estate = queryDesc->estate;
+ Bitmapset *lock_relids;
+ ListCell *lc;
+
+ Assert(queryDesc != NULL);
+ Assert(estate != NULL);
+ Assert(plannedstmt != NULL);
+ Assert(plannedstmt->commandType != CMD_UTILITY);
+
+ lock_relids = bms_difference(estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * Keep the first result relation of each ModifyTable locked even if
+ * pruning removed all target partitions. ExecInitModifyTable() relies on
+ * one such relation remaining available.
+ */
+ foreach(lc, plannedstmt->firstResultRels)
+ {
+ Index rti = lfirst_int(lc);
+
+ lock_relids = bms_add_member(lock_relids, rti);
+ }
+
+ LockRangeTableRelids(plannedstmt->rtable, lock_relids, acquire);
+
+ bms_free(lock_relids);
+
+}
+
+/*
+ * ExecutorPrepAndLock
+ * Perform pruning-aware locking for a single PlannedStmt.
+ *
+ * Locks unprunable relations first, then runs ExecutorPrep() to
+ * determine which partitions survive initial pruning, then locks
+ * only those survivors. Checks *is_valid after each locking step
+ * to detect plan invalidation (e.g., from concurrent DDL or DDL
+ * triggered by a pruning expression).
+ *
+ * Returns true if the plan is still valid and all needed locks are
+ * held. Returns false if the plan was invalidated at any point, in
+ * which case all acquired locks have been released and the caller
+ * should discard the QueryDesc and retry with a fresh plan.
+ */
+bool
+ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid)
+{
+ PlannedStmt *pstmt = queryDesc->plannedstmt;
+
+ /* Lock unprunable rels before pruning can access them. */
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, true);
+ if (!*is_valid)
+ {
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ /* Run pruning and lock survivors. */
+ ExecutorPrep(queryDesc, owner, eflags);
+ AcquireExecutorLocksPrepared(queryDesc, true);
+ if (!*is_valid)
+ {
+ AcquireExecutorLocksPrepared(queryDesc, false);
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ return true;
+}
+
/*
* ExecutorPrep
*
@@ -391,6 +509,30 @@ ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
CurrentResourceOwner = oldowner;
}
+/*
+ * ExecutorPrepCleanup
+ * Clean up an EState that was created by ExecutorPrep() but never
+ * passed to ExecutorStart(). This happens when the plan is
+ * invalidated between prep and execution, and the caller must
+ * discard the prepped state before retrying with a fresh plan.
+ *
+ * Unlike ExecutorEnd(), this does not expect a fully initialized
+ * plan state tree -- only the range table relations and the
+ * EState itself need to be freed.
+ */
+void
+ExecutorPrepCleanup(QueryDesc *queryDesc)
+{
+ EState *estate = queryDesc->estate;
+
+ if (estate == NULL)
+ return;
+
+ ExecCloseRangeTableRelations(estate);
+ FreeExecutorState(estate);
+ queryDesc->estate = NULL;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 478cb01783c..350096bfbe7 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -5133,8 +5133,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -5148,6 +5148,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f4689e7c9f8..4cddac7f2fc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -675,6 +675,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ff0e875f2a2..6ee51f06920 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,24 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of initially
+ * prunable relations. We use bms_next_member() to get the
+ * lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition expansion
+ * preserves RT index order. ExecInitModifyTable() asserts that the
+ * recorded index matches what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 4699b53cab7..53c50ab0fce 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,7 +59,9 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
-static bool PortalLockCachedPlan(Portal portal);
+static bool PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **queryDesc_p);
/*
@@ -488,21 +490,6 @@ restart:
* non-default nesting level for the snapshot.
*/
- /*
- * If the portal is backed by a cached plan, acquire execution
- * locks via PortalLockCachedPlan(). If the plan is
- * invalidated during locking, it replans and may change the
- * portal strategy, requiring us to restart PortalStart().
- */
- if (portal->cplan)
- {
- if (PortalLockCachedPlan(portal))
- {
- PopActiveSnapshot();
- goto restart;
- }
- }
-
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
@@ -516,6 +503,26 @@ restart:
portal->queryEnv,
0);
+ /*
+ * If the portal is backed by a cached plan, acquire execution
+ * locks via PortalLockCachedPlan(). For eligible plans
+ * (single-statement reused generic), this performs
+ * pruning-aware locking: it runs ExecutorPrep() on the
+ * QueryDesc to determine which partitions survive initial
+ * pruning, then locks only those. If the plan is invalidated
+ * during this process, it replans and rebuilds the QueryDesc.
+ * If replanning changes the portal strategy, we must restart
+ * PortalStart() to redispatch.
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal, true, params, &queryDesc))
+ {
+ PopActiveSnapshot();
+ goto restart;
+ }
+ }
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
@@ -555,7 +562,7 @@ restart:
case PORTAL_ONE_MOD_WITH:
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -611,7 +618,7 @@ restart:
*/
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -1828,15 +1835,32 @@ EnsurePortalSnapshotExists(void)
* Acquire execution locks for a cached-plan-backed portal,
* retrying with a fresh plan if the current one is invalidated.
*
+ * If do_prep is true and the plan is eligible (single-statement reused
+ * generic plan), performs pruning-aware locking via ExecutorPrep() and
+ * populates portal->queryDesc with the prepped QueryDesc. Otherwise
+ * falls back to locking all relations in the plan.
+ *
* Returns true if replanning changed portal->strategy, meaning the
- * caller must redispatch. Returns false once locks are held.
+ * caller must redispatch. Returns false once locks are held and the
+ * plan is valid for execution.
*/
static bool
-PortalLockCachedPlan(Portal portal)
+PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **prep_qd)
{
PortalStrategy start_strategy = portal->strategy;
- if (AcquireExecutorLocks(portal->cplan))
+ if (do_prep && CachedPlanCanPrep(portal->cplan, portal->plansource))
+ {
+ Assert(prep_qd);
+ if (ExecutorPrepAndLock(*prep_qd, portal->resowner, 0,
+ &portal->cplan->is_valid))
+ return false;
+ ExecutorPrepCleanup(*prep_qd);
+ FreeQueryDesc(*prep_qd);
+ }
+ else if (AcquireExecutorLocks(portal->cplan))
return false;
/* Replan. Locks will be taken freshly. */
@@ -1852,5 +1876,15 @@ PortalLockCachedPlan(Portal portal)
if (portal->strategy != start_strategy)
return true;
+ if (prep_qd)
+ {
+ Assert(list_length(portal->stmts) == 1);
+ *prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->sourceText,
+ GetActiveSnapshot(), InvalidSnapshot,
+ None_Receiver, params,
+ portal->queryEnv, 0);
+ }
+
return false;
}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index f7fe366859c..fca2f84081e 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1977,6 +1977,22 @@ AcquireExecutorLocks(CachedPlan *cplan)
return true;
}
+/*
+ * CachedPlanCanPrep
+ * Check whether a cached plan is eligible for pruning-aware locking
+ * via ExecutorPrepAndLock().
+ *
+ * Only single-statement reused generic plans with a non-utility command
+ * qualify.
+ */
+bool
+CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource)
+{
+ return (cplan == plansource->gplan &&
+ list_length(cplan->stmt_list) == 1 &&
+ linitial_node(PlannedStmt, cplan->stmt_list)->commandType != CMD_UTILITY);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 472e141bba3..3a03355e6b6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -69,7 +69,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters);
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 33bbdbfeffb..093be9bd24b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -21,6 +21,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -235,6 +236,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid);
+extern void ExecutorPrepCleanup(QueryDesc *queryDesc);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 27a2c6815b7..a5d00633b4b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 14a1dfed2b9..7f6f7cda781 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -120,6 +120,16 @@ typedef struct PlannedStmt
/* RT indexes of relations targeted by INSERT/UPDATE/DELETE/MERGE */
Bitmapset *resultRelationRelids;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index e0fc403e717..2941d3a301b 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -254,4 +254,6 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+extern bool CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource);
+
#endif /* PLANCACHE_H */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 849049f9c51..ec73866486e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4956,3 +4956,187 @@ select * from (select a, b from phv_boolpart) t
(2 rows)
drop table phv_boolpart;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+(1 row)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_2
+ Update on prunelock_p1 prunelock_p_3
+ Update on prunelock_p2 prunelock_p_4
+ Update on prunelock_p3 prunelock_p_5
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_3
+ -> Seq Scan on prunelock_p2 prunelock_p_4
+ -> Seq Scan on prunelock_p3 prunelock_p_5
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_6
+ -> Append
+ Subplans Removed: 3
+ -> Append
+ Subplans Removed: 3
+(16 rows)
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+ a | b
+---+---
+ 1 | 0
+ 2 | 1
+(2 rows)
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index d58534ca1cd..54077294dce 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -402,3 +402,66 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 359a9208056..a98844d14f8 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1518,3 +1518,119 @@ select * from (select a, b from phv_boolpart) t
group by grouping sets (a, b);
drop table phv_boolpart;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index aed388d03a1..90b6c5f82bf 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -228,3 +228,55 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+
+reset plan_cache_mode;
--
2.47.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-05-28 13:13 Thom Brown <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Thom Brown @ 2026-05-28 13:13 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Chao Li <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
On Thu, 28 May 2026 at 09:14, Amit Langote <[email protected]> wrote:
>
> Hi Thom,
>
> On Wed, May 27, 2026 at 9:03 PM Thom Brown <[email protected]> wrote:
> >
> > On Sat, 4 Apr 2026 at 13:11, Amit Langote <[email protected]> wrote:
> > >
> > > Attached is a redesigned version. While working on the previous
> > > design, I grew increasingly uncomfortable with CachedPlanPrepData --
> > > it was smuggling executor state out of GetCachedPlan() through an
> > > out-parameter, which papered over the real problem: GetCachedPlan()
> > > was doing too much. The main change in this version is architectural:
> > > GetCachedPlan() no longer acquires execution locks. Callers now own
> > > that responsibility, which is natural because each call site iterates
> > > stmt_list differently and manages execution state in its own way --
> > > and it lets them choose between conservative lock-all and
> > > pruning-aware locking where appropriate.
> > >
> > > Non-portal call sites remain on the conservative path for now.
> > > _SPI_execute_plan requires care around snapshot setup, which happens
> > > after plan fetch rather than before. SQL functions have a different
> > > issue: init_execution_state() fetches the plan while postquel_start()
> > > handles execution, with execution_state containers in between, making
> > > it harder to thread a prepped QueryDesc through. The portal path and
> > > EXPLAIN EXECUTE cover the most common
> > > prepared-statement-with-partitions workloads; the remaining sites can
> > > be converted incrementally.
> > >
> > > This is now starting to feel closer to what Tom suggested back in
> > > January 2023 [1], where he proposed getting rid of
> > > AcquireExecutorLocks() inside GetCachedPlan() entirely and pushing
> > > lock acquisition out to callers. He noted that "we'd be pushing the
> > > responsibility for looping back and re-planning out to fairly
> > > high-level calling code" and that "we'd definitely be changing some
> > > fundamental APIs." That is the direction I came around to over the
> > > last couple of weeks while wrestling with CachedPlanPrepData. The
> > > reverted approach also tried to follow Tom's direction but moved
> > > locking into ExecutorStart(), which forced it to handle plan
> > > invalidation from inside the executor by mutating the CachedPlan
> > > in-place. This version moves locking out to the callers instead, so
> > > the executor and plan cache never reach into each other.
> > >
> > > The series is now four patches:
> > >
> > > 0001: Move execution lock acquisition out of GetCachedPlan(). Adds
> > > AcquireExecutorLocks() as a caller-facing function with validity check
> > > and retry. Adds PortalLockCachedPlan() in pquery.c to centralize the
> > > portal retry logic. All callers are converted. No behavioral change.
> > >
> > > 0002: Refactor executor's initial partition pruning setup. Cleanup
> > > only, no behavioral change.
> > >
> > > 0003: Introduce ExecutorPrep() and refactor executor startup. Factors
> > > range table init, permission checks, and initial pruning out of
> > > InitPlan(). Scaffolding for 0004; all callers still go through the
> > > normal ExecutorStart() path.
> > >
> > > 0004: Use pruning-aware locking for single-statement cached plans.
> > > Adds ExecutorPrepAndLock() which locks unprunable relations, runs
> > > ExecutorPrep() to determine surviving partitions, then locks only
> > > those. Extends PortalLockCachedPlan() with a pruning-aware path for
> > > eligible plans. Multi-statement CachedPlans (from rule rewriting)
> > > always use conservative locking. In principle, this could be relaxed
> > > if the planner can prove that no pruning expression reads state
> > > modified by an earlier statement, but that is left for a future patch.
> > > Includes regression tests.
> > >
> > > In case it's not clear, I'm not targeting v19 at this point. I'd like
> > > to get this into v20 CF1 and would welcome review from anyone
> > > interested.
> >
> > After not having looked at this in close to 2 years, I thought I'd
> > give it another look.
>
> Thanks for taking a look.
>
> > Not found any user-facing issues, and I'm liking
> > seeing so few locks in pg_locks. I can see that with pruning disabled,
> > the fallback works, pruning-aware locking is working via SPI through
> > plpgsql, running ALTER between executions and also invalidating
> > indexes force replans, and it's looking good.
> >
> > But I also think there might be a bug in patch 0001, but I'd
> > appreciate checking my reasoning because I'm not fully confident I've
> > been diligent enough.
> >
> > When PortalStart() opens a SELECT cursor that's backed by a cached
> > plan, it does roughly the following. It builds a queryDesc (an
> > executor-side struct), one of whose fields is a pointer into the plan
> > tree inside the portal's cached plan. Then it calls
> > PortalLockCachedPlan() to acquire the necessary locks, and finally
> > hands the queryDesc over to the executor.
> >
> > My worry is about what happens if the cached plan turns out to be
> > stale, for instance because someone ran DDL on a referenced table. In
> > that case PortalLockCachedPlan() throws the old plan away (via
> > ReleaseCachedPlan) and fetches a freshly-built replacement, updtating
> > the portal's own pointers to match. But the queryDesc from earlier
> > isn't touched. Its plan pointer still references the old, now-released
> > plan. From what I can see, once that old plan's last reference is
> > dropped its memory can be freed, which would leave the executor
> > reading from freed memory in the next step.
> >
> > The bit I'm least sure about is whether the old plan's memory really
> > does get reclaimed straight away when its refcount hits zero. If
> > something keeps it alive longer then this isn't a bug, or at least not
> > as bad as I'm making out. I had a look but couldn't convince myself
> > either way from the code alone. To actually hit this you'd need a
> > cursor on a cached plan, plus an invalidation arriving in the small
> > window between the portal being set up and the cursor being opened.
> > The race condition is brief, and I've not been able to hit it in
> > testing.
> >
> > The thing that got me thinking this is real: patch 0004 modifies
> > PortalLockCachedPlan() so that whenever it replans, it also rebuilds
> > the queryDesc. That's pretty much the fix I'd expect for this, which
> > makes me suspect somebody hit it at some point. But 0004 only applies
> > that fix on the new pruning-aware code path, and it was mentioned in
> > the thread that 0001 to 0003 might land before 0004. If so, master
> > would carry the bug in the gap between the two.
> >
> > I suspect a way to deal with it would be to move the CreateQueryDesc
> > call in the SELECT case to after PortalLockCachedPlan() returns, which
> > is what the other portal strategies already seem to do. Alternatively,
> > you could bring 0004's changes in this area into 0001 and have
> > PortalLockCachedPlan() always rebuild the queryDesc when it replans.
> >
> > If I've got this wrong and there's some lifetime mechanism I missed
> > that keeps the old plan's memory alive, then it's a non-issue and I'm
> > misreading the code. If I have got it wrong, could you please add
> > comments to make what is going on clearer?
>
> It's a real bug.
>
> You're right that if PortalLockCachedPlan() replans, the QueryDesc
> created before the call still points at the old PlannedStmt from the
> released plan. And yes, 0004 happens to fix it by rebuilding the
> QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are
> broken on their own.
>
> Attached is an updated set with the fix: CreateQueryDesc now runs
> after PortalLockCachedPlan() returns, as you suggested. That said,
> I'll probably focus first on settling the plancache refactoring that
> spun off from this thread [1], and then start a new thread for the
> pruning-aware locking work on top of it, incorporating parts of this
> series.
Thanks.
I've done another pass. I see a reference to
AcquireExecutorLocksUnpruned(), but I can't find this function. Is
this supposed to be AcquireExecutorLocksPrepared()?
And also I have a question about the new firstResultRels code
If I've followed it right, the bit in setrefs.c records the
lowest-numbered RT index from leaf_result_relids as the
per-ModifyTable fallback that's used when all real targets get pruned
away, and the executor side looks it up via
linitial_int(node->resultRelations). For that to work those two have
to pick the same RT index, and the comment justifies it with
"partition expansion preserves RT index order". Where is that
preservation guaranteed?
And with the assertion in ExecInitModifyTable:
Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
With writable CTEs producing more than one ModifyTable node the list
has several entries, so all the assert really checks is that some
recorded entry matches, not that the one recorded for this particular
node matches. If that's correct, then in a case where the wrong entry
happened to line up the right relation wouldn't be locked and nothing
would complain. Is there something that keeps these in order
somewhere?
Thom
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-05-29 08:56 Amit Langote <[email protected]>
parent: Thom Brown <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Amit Langote @ 2026-05-29 08:56 UTC (permalink / raw)
To: Thom Brown <[email protected]>; +Cc: Chao Li <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
On Thu, May 28, 2026 at 10:14 PM Thom Brown <[email protected]> wrote:
> On Thu, 28 May 2026 at 09:14, Amit Langote <[email protected]> wrote:
> > It's a real bug.
> >
> > You're right that if PortalLockCachedPlan() replans, the QueryDesc
> > created before the call still points at the old PlannedStmt from the
> > released plan. And yes, 0004 happens to fix it by rebuilding the
> > QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are
> > broken on their own.
> >
> > Attached is an updated set with the fix: CreateQueryDesc now runs
> > after PortalLockCachedPlan() returns, as you suggested. That said,
> > I'll probably focus first on settling the plancache refactoring that
> > spun off from this thread [1], and then start a new thread for the
> > pruning-aware locking work on top of it, incorporating parts of this
> > series.
>
> Thanks.
>
> I've done another pass. I see a reference to
> AcquireExecutorLocksUnpruned(), but I can't find this function. Is
> this supposed to be AcquireExecutorLocksPrepared()?
You're right, stale comment. It should say
AcquireExecutorLocksPrepared(). Fixed.
> And also I have a question about the new firstResultRels code
>
> If I've followed it right, the bit in setrefs.c records the
> lowest-numbered RT index from leaf_result_relids as the
> per-ModifyTable fallback that's used when all real targets get pruned
> away, and the executor side looks it up via
> linitial_int(node->resultRelations). For that to work those two have
> to pick the same RT index, and the comment justifies it with
> "partition expansion preserves RT index order". Where is that
> preservation guaranteed?
The ordering comes from expand_inherited_rtentry(), which adds child
partitions to the range table sequentially in partition bound order.
Since ModifyTable.resultRelations is built from the same expansion,
its first element is the lowest-numbered RT index among the leaf
partitions for that node. That is the same value
bms_next_member(leaf_result_relids, -1) returns from the Bitmapset,
because Bitmapset iteration returns members in ascending order. I've
added a comment in setrefs.c pointing to expand_inherited_rtentry() as
the source of this guarantee.
> And with the assertion in ExecInitModifyTable:
>
> Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
>
> With writable CTEs producing more than one ModifyTable node the list
> has several entries, so all the assert really checks is that some
> recorded entry matches, not that the one recorded for this particular
> node matches. If that's correct, then in a case where the wrong entry
> happened to line up the right relation wouldn't be locked and nothing
> would complain. Is there something that keeps these in order
> somewhere?
This is a fair observation -- the Assert checks membership in the
global list rather than per-node correspondence. But node A's rti
can't accidentally pass the Assert by matching an entry recorded for
node B. Each ModifyTable node gets its own partition expansion with
distinct RT entries. In a writable CTE like:
WITH upd1 AS (UPDATE t SET ...),
upd2 AS (UPDATE t SET ...)
UPDATE t SET ...
each UPDATE creates a separate set of leaf partition RT entries --
upd1 might get RT indexes 5,6,7, upd2 gets 8,9,10, and the main UPDATE
gets 11,12,13. The global firstResultRels list would be [5, 8, 11].
When ExecInitModifyTable falls back to linitial_int(resultRelations)
for a given node, it finds that node's own entry, because the RT index
sets are disjoint across nodes.
That said, it's worth being explicit about what protections exist at
each layer, since this is safety-critical code:
1. AcquireExecutorLocksPrepared(), added by 0004, locks every entry in
firstResultRels unconditionally. So regardless of which rti a
ModifyTable node falls back to, the relation will be locked.
2. ExecGetRangeTableRelation() has two checks when opening a relation.
For non-result relations (isResultRel=false), it checks
es_unpruned_relids and raises an ERROR in release builds if the
relation was pruned. For result relations (isResultRel=true), that
check is intentionally skipped -- it has to be, because at least one
result relation per ModifyTable node must remain openable even when
all partitions are pruned, since executor code paths like ExecMerge()
and ExecInitPartitionInfo() rely on resultRelInfo[0] being initialized
(see commit 28317de723b). The remaining protection for result
relations is Assert(CheckRelationLockedByMe()) inside table_open,
which fires in debug builds.
3. I've tightened ExecInitModifyTable to close this gap: the
all-pruned fallback path now raises an elog(ERROR) in release builds
if linitial_int(resultRelations) is not found in firstResultRels,
rather than just an Assert. This gives result relations a
production-visible check comparable to what es_unpruned_relids
provides for scan relations.
So the net effect is that for scan relations, opening a
pruned-and-unlocked relation is caught by an ERROR in production via
es_unpruned_relids. For result relations on the all-pruned fallback
path, it's now also caught by an ERROR in production via the
firstResultRels check in ExecInitModifyTable. The locking in
AcquireExecutorLocksPrepared() ensures the relation is always locked
regardless.
Thanks again for the review. A close look at these aspects by someone
other than me is very useful.
--
Thanks, Amit Langote
Attachments:
[application/octet-stream] v13-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch (8.8K, 2-v13-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch)
download | inline diff:
From 05c92346e2bec4c8ec9a7cf45ec572c15d64481f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 26 Mar 2026 16:08:46 +0900
Subject: [PATCH v13 3/4] Introduce ExecutorPrep and refactor executor startup
Move permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper.
ExecutorStart() invokes ExecutorPrep() when QueryDesc->estate is
NULL, keeping current behavior unchanged. If QueryDesc->estate is
already set, ExecutorStart() reuses it.
This is preparatory refactoring only. No caller outside the
executor supplies a prebuilt EState in this commit.
In assert builds, verify that the expected relation locks are held
when entering ExecutorStart().
---
src/backend/executor/README | 10 ++-
src/backend/executor/execMain.c | 152 ++++++++++++++++++++++++++------
src/include/executor/execdesc.h | 2 +-
3 files changed, 132 insertions(+), 32 deletions(-)
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..890bc3d9333 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,17 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart, or implicitly from ExecutorStart
+ if not done earlier. Creates the EState in QueryDesc, performs
+ range table initialization, permission checks, and initial
+ partition pruning.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if QueryDesc.estate is NULL)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4b30f768680..2b9397b72f3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -57,6 +57,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -76,6 +77,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
+static void ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags);
static void InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
@@ -147,7 +149,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -173,9 +174,67 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When no
+ * prep EState was provided, AcquireExecutorLocks() should have locked
+ * every relation in the plan. When one was provided, pruning-aware
+ * locking should have locked at least the unpruned relations. Both
+ * checks are skipped in parallel workers, which acquire relation locks
+ * lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ ExecutorPrep(queryDesc, CurrentResourceOwner, eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking should
+ * have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -274,6 +333,64 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep
+ *
+ * Build the initial executor state for queryDesc before ExecutorStart().
+ *
+ * This creates the EState and performs the subset of executor startup that
+ * does not require plan-tree initialization, allowing that work to be reused
+ * by callers that need executor state before ExecutorStart():
+ *
+ * - initialize the range table
+ * - perform permission checks
+ * - perform initial partition pruning
+ *
+ * On success, queryDesc->estate is set and can later be reused by
+ * ExecutorStart() instead of rebuilding the same state.
+ *
+ * Caller must ensure that queryDesc->snapshot is active.
+ */
+static void
+ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+ PlannedStmt *pstmt;
+
+ Assert(queryDesc != NULL);
+
+ if (queryDesc->operation == CMD_UTILITY)
+ return;
+
+ Assert(ActiveSnapshotSet());
+ Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ Assert(queryDesc->estate == NULL);
+
+ pstmt = queryDesc->plannedstmt;
+
+ estate = CreateExecutorState();
+ queryDesc->estate = estate;
+
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = queryDesc->params;
+ estate->es_queryEnv = queryDesc->queryEnv;
+ estate->es_top_eflags = eflags;
+
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -849,37 +966,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 37c2576e4bc..aea5ec8ea02 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -45,7 +45,7 @@ typedef struct QueryDesc
int query_instr_options; /* OR of InstrumentOption flags for
* query_instr */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
--
2.47.3
[application/octet-stream] v13-0002-Refactor-executor-s-initial-partition-pruning-se.patch (7.3K, 3-v13-0002-Refactor-executor-s-initial-partition-pruning-se.patch)
download | inline diff:
From 29e5ad113f6974a94fbcf984b43fa3ed86f57632 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:38 +0900
Subject: [PATCH v13 2/4] Refactor executor's initial partition pruning setup
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called before PARAM_EXEC parameters are
set up. A later commit needs this when running pruning state setup
outside of InitPlan().
No behavioral change.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++++---------
1 file changed, 48 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..2a3af006f77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -185,8 +185,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1978,7 +1977,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
* and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1996,29 +1995,31 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ Assert(estate->es_part_prune_results == NULL);
foreach(lc, estate->es_part_prune_infos)
{
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
/*
* Perform initial pruning steps, if any, and save the result
- * bitmapset or NULL as described in the header comment.
+ * bitmapset or NULL as described in the header comment. RT indexes
+ * of surviving partitions would be added to validsubplan_rtis.
+ *
+ * Note that when do_initial_prune is false,
+ * CreatePartitionPruneState() would have already added the RT indexes
+ * of all leaf partitions to es_unpruned_relids directly.
*/
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2136,14 +2137,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2377,8 +2376,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2390,9 +2389,28 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
+ }
+ }
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
}
+#endif
}
j++;
@@ -2490,9 +2508,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2520,9 +2539,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
--
2.47.3
[application/octet-stream] v13-0001-Move-execution-lock-acquisition-out-of-GetCached.patch (16.2K, 4-v13-0001-Move-execution-lock-acquisition-out-of-GetCached.patch)
download | inline diff:
From a3214580f2ce1983a111af07ccb092ba03c812c8 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 18:38:34 +0900
Subject: [PATCH v13 1/4] Move execution lock acquisition out of
GetCachedPlan()
GetCachedPlan() previously acquired execution locks on all plan
relations as part of cached plan validation. Move this
responsibility to callers, making GetCachedPlan() return a valid
plan without holding execution locks.
Add AcquireExecutorLocks() as the caller-facing function: it locks
all relations in the plan, checks that the plan is still valid
afterward, and returns false if it was invalidated so the caller
can retry with a fresh plan.
For portal-backed callers, add PortalLockCachedPlan() in pquery.c
which wraps the lock-check-retry loop and handles the case where
replanning changes the portal strategy. Store the CachedPlanSource
pointer in PortalData so retry can call GetCachedPlan() without
the caller threading it through.
Adjust all non-portal GetCachedPlan() callers (SPI, EXPLAIN
EXECUTE, SQL functions) to call AcquireExecutorLocks() explicitly
after fetching the plan.
No behavioral change. This separates plan retrieval from execution
setup, allowing a later commit to substitute pruning-aware locking
for eligible plans.
---
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 14 +++++-
src/backend/executor/functions.c | 14 ++++--
src/backend/executor/spi.c | 22 +++++++--
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 70 ++++++++++++++++++++++++++++-
src/backend/utils/cache/plancache.c | 44 +++++++++++++-----
src/backend/utils/mmgr/portalmem.c | 7 +++
src/include/utils/plancache.h | 1 +
src/include/utils/portal.h | 3 ++
10 files changed, 157 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..cf5deec4943 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..03d7a98fc58 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -207,6 +207,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ entry->plansource,
cplan);
/*
@@ -632,8 +633,17 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
- cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ for (;;)
+ {
+ cplan = GetCachedPlan(entry->plansource, paramLI,
+ CurrentResourceOwner,
+ pstate->p_queryEnv);
+ plan_list = cplan->stmt_list;
+
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ }
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..2afb814a435 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -654,6 +654,7 @@ static bool
init_execution_state(SQLFunctionCachePtr fcache)
{
CachedPlanSource *plansource;
+ CachedPlan *cplan;
execution_state *preves = NULL;
execution_state *lasttages = NULL;
int nstmts;
@@ -696,10 +697,15 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
- fcache->cplan = GetCachedPlan(plansource,
- fcache->paramLI,
- fcache->cowner,
- NULL);
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, fcache->paramLI,
+ fcache->cowner, NULL);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, fcache->cowner);
+ }
+ fcache->cplan = cplan;
/*
* If necessary, make esarray[] bigger to hold the needed state.
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..268cd10bde8 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1686,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ plansource,
cplan);
/*
@@ -2106,6 +2107,16 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
_SPI_current->queryEnv);
Assert(cplan == plansource->gplan);
+ if (!AcquireExecutorLocks(cplan))
+ {
+ /* Plan invalidated during locking; get a fresh one. */
+ ReleaseCachedPlan(cplan,
+ plan->saved ? CurrentResourceOwner : NULL);
+ cplan = GetCachedPlan(plansource, NULL,
+ plan->saved ? CurrentResourceOwner : NULL,
+ _SPI_current->queryEnv);
+ }
+
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
@@ -2574,9 +2585,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
- cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, options->params,
+ plan_owner, _SPI_current->queryEnv);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, plan_owner);
+ }
stmt_list = cplan->stmt_list;
/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index dbef734a93f..2929f158338 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1243,6 +1243,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2042,6 +2043,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ psrc,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index ee731000820..4699b53cab7 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,6 +59,7 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
+static bool PortalLockCachedPlan(Portal portal);
/*
@@ -463,6 +464,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
portal->strategy = ChoosePortalStrategy(portal->stmts);
+restart:
+
/*
* Fire her up according to the strategy
*/
@@ -485,6 +488,21 @@ PortalStart(Portal portal, ParamListInfo params,
* non-default nesting level for the snapshot.
*/
+ /*
+ * If the portal is backed by a cached plan, acquire execution
+ * locks via PortalLockCachedPlan(). If the plan is
+ * invalidated during locking, it replans and may change the
+ * portal strategy, requiring us to restart PortalStart().
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ {
+ PopActiveSnapshot();
+ goto restart;
+ }
+ }
+
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
@@ -535,6 +553,11 @@ PortalStart(Portal portal, ParamListInfo params,
case PORTAL_ONE_RETURNING:
case PORTAL_ONE_MOD_WITH:
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
/*
* We don't start the executor until we are told to run the
@@ -578,7 +601,20 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+
+ /*
+ * GetCachedPlan() no longer acquires execution locks, so we
+ * must do it here. Multi-statement plans always use
+ * conservative locking (all partitions locked); pruning-aware
+ * locking is not feasible because PortalRunMulti() executes
+ * statements sequentially with CCI between them.
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -1786,3 +1822,35 @@ EnsurePortalSnapshotExists(void)
/* PushActiveSnapshotWithLevel might have copied the snapshot */
portal->portalSnapshot = GetActiveSnapshot();
}
+
+/*
+ * PortalLockCachedPlan
+ * Acquire execution locks for a cached-plan-backed portal,
+ * retrying with a fresh plan if the current one is invalidated.
+ *
+ * Returns true if replanning changed portal->strategy, meaning the
+ * caller must redispatch. Returns false once locks are held.
+ */
+static bool
+PortalLockCachedPlan(Portal portal)
+{
+ PortalStrategy start_strategy = portal->strategy;
+
+ if (AcquireExecutorLocks(portal->cplan))
+ return false;
+
+ /* Replan. Locks will be taken freshly. */
+ ReleaseCachedPlan(portal->cplan, portal->resowner);
+ portal->cplan = NULL;
+ portal->stmts = NIL;
+ portal->cplan = GetCachedPlan(portal->plansource,
+ portal->portalParams,
+ portal->resowner,
+ portal->queryEnv);
+ portal->stmts = portal->cplan->stmt_list;
+ portal->strategy = ChoosePortalStrategy(portal->stmts);
+ if (portal->strategy != start_strategy)
+ return true;
+
+ return false;
+}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 698e7c1aa22..f7fe366859c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,7 +100,7 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksInt(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -945,8 +945,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, the generic plan may be reused as a valid cached
+ * plan. Any execution-time setup, including lock acquisition, is the
+ * caller's responsibility.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -983,8 +984,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
-
/*
* If plan was transient, check to see if TransactionXmin has
* advanced, and if so invalidate it.
@@ -1003,9 +1002,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
/* Successfully revalidated and locked the query. */
return true;
}
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
}
/*
@@ -1282,8 +1278,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid but no execution locks are held.
+ * The caller must call AcquireExecutorLocks() before executing.
+ * For freshly built plans (custom or new generic), the planner
+ * already holds the needed locks, so AcquireExecutorLocks() is
+ * redundant but harmless.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1906,9 +1905,11 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksInt(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1956,27 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * AcquireExecutorLocks
+ * Acquire execution locks on all relations in a cached plan.
+ *
+ * Returns true if the plan is still valid after locking. Returns
+ * false if the plan was invalidated while locks were being acquired,
+ * in which case the locks have been released and the caller should
+ * discard this plan and retry with a fresh one from GetCachedPlan().
+ */
+bool
+AcquireExecutorLocks(CachedPlan *cplan)
+{
+ AcquireExecutorLocksInt(cplan->stmt_list, true);
+ if (!cplan->is_valid)
+ {
+ AcquireExecutorLocksInt(cplan->stmt_list, false);
+ return false;
+ }
+ return true;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 493f9b0ee19..613f3be30b3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -272,6 +272,10 @@ CreateNewPortal(void)
* the passed plan trees have adequate lifetime. Typically this is done by
* copying them into the portal's context.
*
+ * If plansource is provided, it is the CachedPlanSource that produced
+ * cplan. PortalLockCachedPlan() uses it to fetch a fresh plan if the
+ * current one is invalidated during execution lock acquisition.
+ *
* The caller is also responsible for ensuring that the passed prepStmtName
* (if not NULL) and sourceText have adequate lifetime.
*
@@ -286,6 +290,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -299,6 +304,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->plansource = plansource;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
@@ -517,6 +523,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* drop cached plan reference, if any */
PortalReleaseCachedPlan(portal);
+ portal->plansource = NULL;
/*
* If portal has a snapshot protecting its data, release that. This needs
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 7a4a85c8038..e0fc403e717 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -241,6 +241,7 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern bool AcquireExecutorLocks(CachedPlan *cplan);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..3af535362cd 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,8 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ CachedPlanSource *plansource; /* CachedPlanSource, for replanning on
+ * invalidation */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +242,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v13-0004-Use-pruning-aware-locking-for-single-statement-c.patch (40.8K, 5-v13-0004-Use-pruning-aware-locking-for-single-statement-c.patch)
download | inline diff:
From 5785e0903b867f024e4b675783dfd76dc00ee733 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 20:43:14 +0900
Subject: [PATCH v13 4/4] Use pruning-aware locking for single-statement cached
plans
For single-statement reused generic plans, perform initial partition
pruning before acquiring execution locks, then lock only the
surviving partitions.
Add ExecutorPrepAndLock() which encapsulates the pruning-aware lock
sequence: lock unprunable relations, call ExecutorPrep() to run
initial pruning, then lock survivors. Plan validity is checked
after each step; ExecutorPrepCleanup() handles the case where the
plan is invalidated between prep and execution.
Extend PortalLockCachedPlan() to use the pruning-aware path for
eligible plans (single-statement reused generic, non-utility).
All other cases continue using the conservative lock-all path
from the previous commit.
Track firstResultRels in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving ExecInitModifyTable()
assumptions about the first result relation being available.
Multi-statement CachedPlans (from rule rewriting) always use
conservative locking, since PortalRunMulti() executes statements
sequentially with CCI between them and later statements' pruning
expressions may depend on earlier ones' effects. In principle,
this could be relaxed if the planner can prove that no pruning
expression reads state modified by an earlier statement, but that
is left for a future patch.
Regression tests are included to verify:
- Only surviving partitions are locked when pruning is enabled, and
all partitions are locked when it is disabled (pg_locks inspection).
- Multiple ModifyTable nodes (via writable CTEs) handle the case where
all target partitions are pruned, exercising firstResultRels.
- Plan invalidation during pruning-aware lock setup (DDL triggered by
a pruning expression) discards the prep state and replans cleanly.
- Multi-statement CachedPlans (from rule rewriting) fall back to
locking all partitions, avoiding stale pruning results.
Note for extension authors: code that accesses partition relations
through EState must check that the RT index is a member of
es_unpruned_relids before opening the relation. Previously this
was an optimization; it is now a correctness requirement, because
pruned partitions may not be locked.
---
src/backend/commands/explain.c | 45 +++--
src/backend/commands/prepare.c | 30 ++-
src/backend/executor/execMain.c | 142 ++++++++++++++
src/backend/executor/nodeModifyTable.c | 7 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 19 ++
src/backend/tcop/pquery.c | 76 ++++++--
src/backend/utils/cache/plancache.c | 16 ++
src/include/commands/explain.h | 3 +-
src/include/executor/executor.h | 4 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 2 +
src/test/regress/expected/partition_prune.out | 184 ++++++++++++++++++
src/test/regress/expected/plancache.out | 63 ++++++
src/test/regress/sql/partition_prune.sql | 116 +++++++++++
src/test/regress/sql/plancache.sql | 52 +++++
17 files changed, 734 insertions(+), 39 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 112c17b0d64..c5254f0f920 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -377,7 +377,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ NULL);
}
/*
@@ -501,7 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters)
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -532,13 +534,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
/*
* We discard the output if we have no use for it. If we're explaining
* CREATE TABLE AS, we'd better use the appropriate tuple receiver, while
@@ -554,10 +549,34 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
else
dest = None_Receiver;
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ /*
+ * Create a QueryDesc for the query, or use the one provided by the
+ * caller. When reusing a prep QueryDesc, its snapshot was set at
+ * creation time; we push it as active for ExecutorStart and override the
+ * destination and instrument options, which were not known when the
+ * caller created it.
+ */
+ if (prep_qd)
+ {
+ PushActiveSnapshot(GetActiveSnapshot());
+ queryDesc = prep_qd;
+ Assert(queryDesc->dest == None_Receiver);
+ queryDesc->dest = dest;
+ queryDesc->instrument_options = instrument_option;
+ }
+ else
+ {
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+ queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv,
+ instrument_option);
+ }
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 03d7a98fc58..3bbbc052149 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -588,6 +588,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ QueryDesc *prep_qd = NULL;
if (es->memory)
{
@@ -640,8 +641,31 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
pstate->p_queryEnv);
plan_list = cplan->stmt_list;
- if (AcquireExecutorLocks(cplan))
+ if (!CachedPlanCanPrep(cplan, entry->plansource))
+ {
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ continue;
+ }
+
+ prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, plan_list),
+ query_string,
+ GetActiveSnapshot(),
+ InvalidSnapshot,
+ None_Receiver, /* ExplainOnePlan will fix */
+ paramLI,
+ pstate->p_queryEnv,
+ 0 /* ExplainOnePlan will fix */ );
+ if (ExecutorPrepAndLock(prep_qd,
+ CurrentResourceOwner,
+ es->generic ? EXEC_FLAG_EXPLAIN_GENERIC : 0,
+ &cplan->is_valid))
break;
+
+ /* Try again. */
+ ExecutorPrepCleanup(prep_qd);
+ FreeQueryDesc(prep_qd);
ReleaseCachedPlan(cplan, CurrentResourceOwner);
}
@@ -664,6 +688,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
+ Assert(prep_qd == NULL || list_length(plan_list) == 1);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
@@ -671,7 +696,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ prep_qd);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, pstate, paramLI);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2b9397b72f3..bbfa0e2b92a 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -333,6 +333,124 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * LockRangeTableRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksPrepared().
+ */
+static void
+LockRangeTableRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not fail
+ * if it's been dropped entirely --- we'll just transiently acquire a
+ * non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksPrepared
+ *
+ * Acquire or release execution locks using pruning results already computed
+ * by ExecutorPrep() and stored in queryDesc->estate.
+ *
+ * This is intended for single-statement reused generic-plan paths that
+ * choose pruning-aware locking instead of the conservative
+ * AcquireExecutorLocks() path.
+ */
+static void
+AcquireExecutorLocksPrepared(QueryDesc *queryDesc, bool acquire)
+{
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ EState *estate = queryDesc->estate;
+ Bitmapset *lock_relids;
+ ListCell *lc;
+
+ Assert(queryDesc != NULL);
+ Assert(estate != NULL);
+ Assert(plannedstmt != NULL);
+ Assert(plannedstmt->commandType != CMD_UTILITY);
+
+ lock_relids = bms_difference(estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * Keep the first result relation of each ModifyTable locked even if
+ * pruning removed all target partitions. ExecInitModifyTable() relies on
+ * one such relation remaining available.
+ */
+ foreach(lc, plannedstmt->firstResultRels)
+ {
+ Index rti = lfirst_int(lc);
+
+ lock_relids = bms_add_member(lock_relids, rti);
+ }
+
+ LockRangeTableRelids(plannedstmt->rtable, lock_relids, acquire);
+
+ bms_free(lock_relids);
+
+}
+
+/*
+ * ExecutorPrepAndLock
+ * Perform pruning-aware locking for a single PlannedStmt.
+ *
+ * Locks unprunable relations first, then runs ExecutorPrep() to
+ * determine which partitions survive initial pruning, then locks
+ * only those survivors. Checks *is_valid after each locking step
+ * to detect plan invalidation (e.g., from concurrent DDL or DDL
+ * triggered by a pruning expression).
+ *
+ * Returns true if the plan is still valid and all needed locks are
+ * held. Returns false if the plan was invalidated at any point, in
+ * which case all acquired locks have been released and the caller
+ * should discard the QueryDesc and retry with a fresh plan.
+ */
+bool
+ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid)
+{
+ PlannedStmt *pstmt = queryDesc->plannedstmt;
+
+ /* Lock unprunable rels before pruning can access them. */
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, true);
+ if (!*is_valid)
+ {
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ /* Run pruning and lock survivors. */
+ ExecutorPrep(queryDesc, owner, eflags);
+ AcquireExecutorLocksPrepared(queryDesc, true);
+ if (!*is_valid)
+ {
+ AcquireExecutorLocksPrepared(queryDesc, false);
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ return true;
+}
+
/*
* ExecutorPrep
*
@@ -391,6 +509,30 @@ ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
CurrentResourceOwner = oldowner;
}
+/*
+ * ExecutorPrepCleanup
+ * Clean up an EState that was created by ExecutorPrep() but never
+ * passed to ExecutorStart(). This happens when the plan is
+ * invalidated between prep and execution, and the caller must
+ * discard the prepped state before retrying with a fresh plan.
+ *
+ * Unlike ExecutorEnd(), this does not expect a fully initialized
+ * plan state tree -- only the range table relations and the
+ * EState itself need to be freed.
+ */
+void
+ExecutorPrepCleanup(QueryDesc *queryDesc)
+{
+ EState *estate = queryDesc->estate;
+
+ if (estate == NULL)
+ return;
+
+ ExecCloseRangeTableRelations(estate);
+ FreeExecutorState(estate);
+ queryDesc->estate = NULL;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 478cb01783c..6e78b61f700 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -5133,8 +5133,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksPrepared(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -5148,6 +5148,9 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ if (!list_member_int(estate->es_plannedstmt->firstResultRels, rti))
+ elog(ERROR, "first result relation %u not found in firstResultRels",
+ rti);
i = 0;
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f4689e7c9f8..4cddac7f2fc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -675,6 +675,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ff0e875f2a2..4495bc6e627 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,25 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of initially
+ * prunable relations. We use bms_next_member() to get the
+ * lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because
+ * expand_inherited_rtentry() adds child partitions to the range table
+ * sequentially in partition bound order, and resultRelations is built
+ * from that same expansion.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 4699b53cab7..53c50ab0fce 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,7 +59,9 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
-static bool PortalLockCachedPlan(Portal portal);
+static bool PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **queryDesc_p);
/*
@@ -488,21 +490,6 @@ restart:
* non-default nesting level for the snapshot.
*/
- /*
- * If the portal is backed by a cached plan, acquire execution
- * locks via PortalLockCachedPlan(). If the plan is
- * invalidated during locking, it replans and may change the
- * portal strategy, requiring us to restart PortalStart().
- */
- if (portal->cplan)
- {
- if (PortalLockCachedPlan(portal))
- {
- PopActiveSnapshot();
- goto restart;
- }
- }
-
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
@@ -516,6 +503,26 @@ restart:
portal->queryEnv,
0);
+ /*
+ * If the portal is backed by a cached plan, acquire execution
+ * locks via PortalLockCachedPlan(). For eligible plans
+ * (single-statement reused generic), this performs
+ * pruning-aware locking: it runs ExecutorPrep() on the
+ * QueryDesc to determine which partitions survive initial
+ * pruning, then locks only those. If the plan is invalidated
+ * during this process, it replans and rebuilds the QueryDesc.
+ * If replanning changes the portal strategy, we must restart
+ * PortalStart() to redispatch.
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal, true, params, &queryDesc))
+ {
+ PopActiveSnapshot();
+ goto restart;
+ }
+ }
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
@@ -555,7 +562,7 @@ restart:
case PORTAL_ONE_MOD_WITH:
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -611,7 +618,7 @@ restart:
*/
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -1828,15 +1835,32 @@ EnsurePortalSnapshotExists(void)
* Acquire execution locks for a cached-plan-backed portal,
* retrying with a fresh plan if the current one is invalidated.
*
+ * If do_prep is true and the plan is eligible (single-statement reused
+ * generic plan), performs pruning-aware locking via ExecutorPrep() and
+ * populates portal->queryDesc with the prepped QueryDesc. Otherwise
+ * falls back to locking all relations in the plan.
+ *
* Returns true if replanning changed portal->strategy, meaning the
- * caller must redispatch. Returns false once locks are held.
+ * caller must redispatch. Returns false once locks are held and the
+ * plan is valid for execution.
*/
static bool
-PortalLockCachedPlan(Portal portal)
+PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **prep_qd)
{
PortalStrategy start_strategy = portal->strategy;
- if (AcquireExecutorLocks(portal->cplan))
+ if (do_prep && CachedPlanCanPrep(portal->cplan, portal->plansource))
+ {
+ Assert(prep_qd);
+ if (ExecutorPrepAndLock(*prep_qd, portal->resowner, 0,
+ &portal->cplan->is_valid))
+ return false;
+ ExecutorPrepCleanup(*prep_qd);
+ FreeQueryDesc(*prep_qd);
+ }
+ else if (AcquireExecutorLocks(portal->cplan))
return false;
/* Replan. Locks will be taken freshly. */
@@ -1852,5 +1876,15 @@ PortalLockCachedPlan(Portal portal)
if (portal->strategy != start_strategy)
return true;
+ if (prep_qd)
+ {
+ Assert(list_length(portal->stmts) == 1);
+ *prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->sourceText,
+ GetActiveSnapshot(), InvalidSnapshot,
+ None_Receiver, params,
+ portal->queryEnv, 0);
+ }
+
return false;
}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index f7fe366859c..fca2f84081e 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1977,6 +1977,22 @@ AcquireExecutorLocks(CachedPlan *cplan)
return true;
}
+/*
+ * CachedPlanCanPrep
+ * Check whether a cached plan is eligible for pruning-aware locking
+ * via ExecutorPrepAndLock().
+ *
+ * Only single-statement reused generic plans with a non-utility command
+ * qualify.
+ */
+bool
+CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource)
+{
+ return (cplan == plansource->gplan &&
+ list_length(cplan->stmt_list) == 1 &&
+ linitial_node(PlannedStmt, cplan->stmt_list)->commandType != CMD_UTILITY);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 472e141bba3..3a03355e6b6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -69,7 +69,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters);
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 33bbdbfeffb..093be9bd24b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -21,6 +21,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -235,6 +236,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid);
+extern void ExecutorPrepCleanup(QueryDesc *queryDesc);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 27a2c6815b7..a5d00633b4b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 14a1dfed2b9..1a328ea138c 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -120,6 +120,16 @@ typedef struct PlannedStmt
/* RT indexes of relations targeted by INSERT/UPDATE/DELETE/MERGE */
Bitmapset *resultRelationRelids;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksPrepared() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index e0fc403e717..2941d3a301b 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -254,4 +254,6 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+extern bool CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource);
+
#endif /* PLANCACHE_H */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 849049f9c51..ec73866486e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4956,3 +4956,187 @@ select * from (select a, b from phv_boolpart) t
(2 rows)
drop table phv_boolpart;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+(1 row)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_2
+ Update on prunelock_p1 prunelock_p_3
+ Update on prunelock_p2 prunelock_p_4
+ Update on prunelock_p3 prunelock_p_5
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_3
+ -> Seq Scan on prunelock_p2 prunelock_p_4
+ -> Seq Scan on prunelock_p3 prunelock_p_5
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_6
+ -> Append
+ Subplans Removed: 3
+ -> Append
+ Subplans Removed: 3
+(16 rows)
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+ a | b
+---+---
+ 1 | 0
+ 2 | 1
+(2 rows)
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index d58534ca1cd..54077294dce 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -402,3 +402,66 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 359a9208056..a98844d14f8 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1518,3 +1518,119 @@ select * from (select a, b from phv_boolpart) t
group by grouping sets (a, b);
drop table phv_boolpart;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index aed388d03a1..90b6c5f82bf 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -228,3 +228,55 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+
+reset plan_cache_mode;
--
2.47.3
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-05-29 10:30 Thom Brown <[email protected]>
parent: Amit Langote <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Thom Brown @ 2026-05-29 10:30 UTC (permalink / raw)
To: Amit Langote <[email protected]>; +Cc: Chao Li <[email protected]>; Tom Lane <[email protected]>; Tender Wang <[email protected]>; Alexander Lakhin <[email protected]>; Tomas Vondra <[email protected]>; Robert Haas <[email protected]>; Alvaro Herrera <[email protected]>; Andres Freund <[email protected]>; Daniel Gustafsson <[email protected]>; David Rowley <[email protected]>; pgsql-hackers
On Fri, 29 May 2026 at 09:57, Amit Langote <[email protected]> wrote:
>
> On Thu, May 28, 2026 at 10:14 PM Thom Brown <[email protected]> wrote:
> > On Thu, 28 May 2026 at 09:14, Amit Langote <[email protected]> wrote:
> > > It's a real bug.
> > >
> > > You're right that if PortalLockCachedPlan() replans, the QueryDesc
> > > created before the call still points at the old PlannedStmt from the
> > > released plan. And yes, 0004 happens to fix it by rebuilding the
> > > QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are
> > > broken on their own.
> > >
> > > Attached is an updated set with the fix: CreateQueryDesc now runs
> > > after PortalLockCachedPlan() returns, as you suggested. That said,
> > > I'll probably focus first on settling the plancache refactoring that
> > > spun off from this thread [1], and then start a new thread for the
> > > pruning-aware locking work on top of it, incorporating parts of this
> > > series.
> >
> > Thanks.
> >
> > I've done another pass. I see a reference to
> > AcquireExecutorLocksUnpruned(), but I can't find this function. Is
> > this supposed to be AcquireExecutorLocksPrepared()?
>
> You're right, stale comment. It should say
> AcquireExecutorLocksPrepared(). Fixed.
>
> > And also I have a question about the new firstResultRels code
> >
> > If I've followed it right, the bit in setrefs.c records the
> > lowest-numbered RT index from leaf_result_relids as the
> > per-ModifyTable fallback that's used when all real targets get pruned
> > away, and the executor side looks it up via
> > linitial_int(node->resultRelations). For that to work those two have
> > to pick the same RT index, and the comment justifies it with
> > "partition expansion preserves RT index order". Where is that
> > preservation guaranteed?
>
> The ordering comes from expand_inherited_rtentry(), which adds child
> partitions to the range table sequentially in partition bound order.
> Since ModifyTable.resultRelations is built from the same expansion,
> its first element is the lowest-numbered RT index among the leaf
> partitions for that node. That is the same value
> bms_next_member(leaf_result_relids, -1) returns from the Bitmapset,
> because Bitmapset iteration returns members in ascending order. I've
> added a comment in setrefs.c pointing to expand_inherited_rtentry() as
> the source of this guarantee.
>
> > And with the assertion in ExecInitModifyTable:
> >
> > Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
> >
> > With writable CTEs producing more than one ModifyTable node the list
> > has several entries, so all the assert really checks is that some
> > recorded entry matches, not that the one recorded for this particular
> > node matches. If that's correct, then in a case where the wrong entry
> > happened to line up the right relation wouldn't be locked and nothing
> > would complain. Is there something that keeps these in order
> > somewhere?
>
> This is a fair observation -- the Assert checks membership in the
> global list rather than per-node correspondence. But node A's rti
> can't accidentally pass the Assert by matching an entry recorded for
> node B. Each ModifyTable node gets its own partition expansion with
> distinct RT entries. In a writable CTE like:
>
> WITH upd1 AS (UPDATE t SET ...),
> upd2 AS (UPDATE t SET ...)
> UPDATE t SET ...
>
> each UPDATE creates a separate set of leaf partition RT entries --
> upd1 might get RT indexes 5,6,7, upd2 gets 8,9,10, and the main UPDATE
> gets 11,12,13. The global firstResultRels list would be [5, 8, 11].
> When ExecInitModifyTable falls back to linitial_int(resultRelations)
> for a given node, it finds that node's own entry, because the RT index
> sets are disjoint across nodes.
>
> That said, it's worth being explicit about what protections exist at
> each layer, since this is safety-critical code:
>
> 1. AcquireExecutorLocksPrepared(), added by 0004, locks every entry in
> firstResultRels unconditionally. So regardless of which rti a
> ModifyTable node falls back to, the relation will be locked.
>
> 2. ExecGetRangeTableRelation() has two checks when opening a relation.
> For non-result relations (isResultRel=false), it checks
> es_unpruned_relids and raises an ERROR in release builds if the
> relation was pruned. For result relations (isResultRel=true), that
> check is intentionally skipped -- it has to be, because at least one
> result relation per ModifyTable node must remain openable even when
> all partitions are pruned, since executor code paths like ExecMerge()
> and ExecInitPartitionInfo() rely on resultRelInfo[0] being initialized
> (see commit 28317de723b). The remaining protection for result
> relations is Assert(CheckRelationLockedByMe()) inside table_open,
> which fires in debug builds.
>
> 3. I've tightened ExecInitModifyTable to close this gap: the
> all-pruned fallback path now raises an elog(ERROR) in release builds
> if linitial_int(resultRelations) is not found in firstResultRels,
> rather than just an Assert. This gives result relations a
> production-visible check comparable to what es_unpruned_relids
> provides for scan relations.
>
> So the net effect is that for scan relations, opening a
> pruned-and-unlocked relation is caught by an ERROR in production via
> es_unpruned_relids. For result relations on the all-pruned fallback
> path, it's now also caught by an ERROR in production via the
> firstResultRels check in ExecInitModifyTable. The locking in
> AcquireExecutorLocksPrepared() ensures the relation is always locked
> regardless.
>
> Thanks again for the review. A close look at these aspects by someone
> other than me is very useful.
Ah, the disjoint RT-entries point is what I was missing. I'd been
reading firstResultRels as a flat list where in theory any entry could
line up with any node's lookup, which is what made the assert feel
potentially insufficient. If each ModifyTable's expansion produces its
own non-overlapping set of leaf RT indexes then membership in the
global list really is equivalent to membership in this node's own
entry, and the assert is sufficient as it stands. Walking through the
writable-CTE case helped.
Thanks
Thom
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-06-02 17:54 Ilmar Yunusov <[email protected]>
parent: Thom Brown <[email protected]>
0 siblings, 1 reply; 82+ messages in thread
From: Ilmar Yunusov @ 2026-06-02 17:54 UTC (permalink / raw)
To: [email protected]; +Cc: Amit Langote <[email protected]>
The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, failed
Spec compliant: not tested
Documentation: not tested
Hi,
I looked at v13, focusing on apply/build status and relation-lock behavior for
reused generic plans after initial partition pruning.
I used the v13 series from Amit's 2026-05-29 message, on origin/master at
4b0bf0788b066a4ca1d4f959566678e44ec93422.
The series applies cleanly with git am, and git diff --check reports no
issues.
I first built with:
./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu
make -s -j8
make -s install
make -C src/test/regress check
passed; all 245 tests passed, including plancache and partition_prune.
I also built a cassert/debug tree with:
./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu --enable-cassert --enable-debug 'CFLAGS=-O0 -g'
make -s -j8
make -s install
and ran:
make -C src/test/regress check
which also passed; all 245 tests passed.
For the lock behavior, I used a list-partitioned table with force_generic_plan.
After the generic plan had been built and then reused, EXECUTE held only the
matching child partition lock. For example, EXECUTE q(1) held only the
following child lock:
manual_prunelock_p1
EXPLAIN EXECUTE behaved the same way on a reused generic plan; EXPLAIN EXECUTE
q(2) removed the other subplans and held only the following child lock:
manual_prunelock_p2
With enable_partition_pruning = off and a newly prepared statement, executing
the same SELECT held all child partition locks:
manual_prunelock_p1, manual_prunelock_p2, manual_prunelock_p3
I also ran a bounded cassert/debug stress check around plan invalidation. It
did 20 cycles where a child index was created and dropped before EXECUTE, and
20 similar cycles before EXPLAIN EXECUTE. In each cycle, the first execution
after invalidation/replanning held all child partition locks, and the next
execution reusing the generic plan held only the matching child partition lock.
That matches my reading that the patch is reducing locks for reused generic
plans, not for the execution that has to rebuild the plan.
One behavior I wanted to confirm: prepared UPDATE execution still held all
child partition locks in my manual check, including on the second execution
where the generic plan was being reused.
The test was:
prepare upd(int, text) as
update stress_prunelock_p set b = $2 where a = $1;
Then both:
execute upd(3, 'updated-row-3');
and an all-pruned value:
execute upd(99, 'no-row');
held:
stress_prunelock_p1, stress_prunelock_p2, stress_prunelock_p3,
stress_prunelock_p4
pg_prepared_statements showed generic_plans increasing for this prepared
statement, so this was not a custom-plan case.
Is this expected for ModifyTable/result relations in v13, or did I miss an
eligibility condition that prevents pruning-aware locking from being used for
this prepared UPDATE case? I saw the recent firstResultRels discussion, but I
was not sure whether those changes are intended only to make pruned
result-relation initialization safe, or whether actual prepared DML execution
is expected to see reduced child partition locking as well.
I did not review the broader plancache refactoring design, did not run
installcheck-world, and did not test concurrent DDL from a separate session.
Regards,
Ilmar Yunusov
The new status of this patch is: Waiting on Author
^ permalink raw reply [nested|flat] 82+ messages in thread
* Re: generic plans and "initial" pruning
@ 2026-06-04 00:25 Amit Langote <[email protected]>
parent: Ilmar Yunusov <[email protected]>
0 siblings, 0 replies; 82+ messages in thread
From: Amit Langote @ 2026-06-04 00:25 UTC (permalink / raw)
To: Ilmar Yunusov <[email protected]>; +Cc: [email protected]
Hi Ilmar,
On Wed, Jun 3, 2026 at 2:55 AM Ilmar Yunusov <[email protected]> wrote:
>
> I looked at v13, focusing on apply/build status and relation-lock behavior for
> reused generic plans after initial partition pruning.
>
> I used the v13 series from Amit's 2026-05-29 message, on origin/master at
> 4b0bf0788b066a4ca1d4f959566678e44ec93422.
>
> The series applies cleanly with git am, and git diff --check reports no
> issues.
>
> I first built with:
>
> ./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu
> make -s -j8
> make -s install
>
> make -C src/test/regress check
>
> passed; all 245 tests passed, including plancache and partition_prune.
>
> I also built a cassert/debug tree with:
>
> ./configure --prefix="$PWD/pg-install" --without-readline --without-zlib --without-icu --enable-cassert --enable-debug 'CFLAGS=-O0 -g'
> make -s -j8
> make -s install
>
> and ran:
>
> make -C src/test/regress check
>
> which also passed; all 245 tests passed.
>
> For the lock behavior, I used a list-partitioned table with force_generic_plan.
> After the generic plan had been built and then reused, EXECUTE held only the
> matching child partition lock. For example, EXECUTE q(1) held only the
> following child lock:
>
> manual_prunelock_p1
>
> EXPLAIN EXECUTE behaved the same way on a reused generic plan; EXPLAIN EXECUTE
> q(2) removed the other subplans and held only the following child lock:
>
> manual_prunelock_p2
>
> With enable_partition_pruning = off and a newly prepared statement, executing
> the same SELECT held all child partition locks:
>
> manual_prunelock_p1, manual_prunelock_p2, manual_prunelock_p3
>
> I also ran a bounded cassert/debug stress check around plan invalidation. It
> did 20 cycles where a child index was created and dropped before EXECUTE, and
> 20 similar cycles before EXPLAIN EXECUTE. In each cycle, the first execution
> after invalidation/replanning held all child partition locks, and the next
> execution reusing the generic plan held only the matching child partition lock.
> That matches my reading that the patch is reducing locks for reused generic
> plans, not for the execution that has to rebuild the plan.
Thanks for thorough testing.
> One behavior I wanted to confirm: prepared UPDATE execution still held all
> child partition locks in my manual check, including on the second execution
> where the generic plan was being reused.
>
> The test was:
>
> prepare upd(int, text) as
> update stress_prunelock_p set b = $2 where a = $1;
>
> Then both:
>
> execute upd(3, 'updated-row-3');
>
> and an all-pruned value:
>
> execute upd(99, 'no-row');
>
> held:
>
> stress_prunelock_p1, stress_prunelock_p2, stress_prunelock_p3,
> stress_prunelock_p4
>
> pg_prepared_statements showed generic_plans increasing for this prepared
> statement, so this was not a custom-plan case.
>
> Is this expected for ModifyTable/result relations in v13, or did I miss an
> eligibility condition that prevents pruning-aware locking from being used for
> this prepared UPDATE case? I saw the recent firstResultRels discussion, but I
> was not sure whether those changes are intended only to make pruned
> result-relation initialization safe, or whether actual prepared DML execution
> is expected to see reduced child partition locking as well.
Yes, this is expected; the pruning-aware path currently only kicks in
for the portal strategy used by SELECT. I hadn't noticed that
UPDATE/DELETE ends up on a different strategy that bypasses the new
pruning-aware locking path. I need to think about how best to handle
this; the DML portal strategies defer executor startup to a later
point, so it may require some restructuring.
--
Thanks, Amit Langote
^ permalink raw reply [nested|flat] 82+ messages in thread
end of thread, other threads:[~2026-06-04 00:25 UTC | newest]
Thread overview: 82+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2021-12-25 03:36 generic plans and "initial" pruning Amit Langote <[email protected]>
2021-12-28 13:12 ` Ashutosh Bapat <[email protected]>
2021-12-31 02:26 ` Amit Langote <[email protected]>
2022-03-28 07:17 ` Amit Langote <[email protected]>
2022-03-28 07:28 ` Amit Langote <[email protected]>
2022-03-31 03:25 ` Amit Langote <[email protected]>
2022-03-31 09:56 ` Alvaro Herrera <[email protected]>
2022-03-31 11:11 ` Amit Langote <[email protected]>
2022-05-27 08:09 ` Amit Langote <[email protected]>
2022-05-27 20:08 ` Zhihong Yu <[email protected]>
2022-07-05 17:43 ` Jacob Champion <[email protected]>
2022-07-06 02:37 ` Amit Langote <[email protected]>
2022-07-13 06:40 ` Amit Langote <[email protected]>
2022-07-13 07:03 ` Amit Langote <[email protected]>
2022-07-27 03:00 ` Amit Langote <[email protected]>
2022-07-27 16:27 ` Robert Haas <[email protected]>
2022-07-29 04:20 ` Amit Langote <[email protected]>
2022-10-12 07:36 ` Amit Langote <[email protected]>
2022-10-17 09:29 ` Amit Langote <[email protected]>
2022-10-27 02:41 ` Amit Langote <[email protected]>
2022-11-08 06:22 ` Amit Langote <[email protected]>
2022-11-30 18:12 ` Alvaro Herrera <[email protected]>
2022-12-01 07:59 ` Amit Langote <[email protected]>
2022-12-01 11:21 ` Alvaro Herrera <[email protected]>
2022-12-01 12:43 ` Amit Langote <[email protected]>
2022-12-02 10:40 ` Amit Langote <[email protected]>
2022-12-05 03:00 ` Amit Langote <[email protected]>
2022-12-05 06:08 ` Amit Langote <[email protected]>
2022-12-06 19:00 ` Alvaro Herrera <[email protected]>
2022-12-09 08:26 ` Amit Langote <[email protected]>
2022-12-09 09:52 ` Alvaro Herrera <[email protected]>
2022-12-09 10:34 ` Amit Langote <[email protected]>
2022-12-09 10:49 ` Alvaro Herrera <[email protected]>
2022-12-09 11:02 ` Amit Langote <[email protected]>
2022-12-09 11:37 ` Alvaro Herrera <[email protected]>
2022-12-12 11:19 ` Amit Langote <[email protected]>
2022-12-12 17:24 ` Alvaro Herrera <[email protected]>
2022-12-14 08:35 ` Amit Langote <[email protected]>
2022-12-16 02:33 ` Amit Langote <[email protected]>
2022-12-21 10:18 ` Alvaro Herrera <[email protected]>
2022-12-21 10:47 ` Amit Langote <[email protected]>
2022-12-21 15:18 ` Tom Lane <[email protected]>
2022-07-29 04:55 ` Tom Lane <[email protected]>
2022-07-29 12:22 ` Robert Haas <[email protected]>
2022-07-29 16:47 ` Tom Lane <[email protected]>
2022-07-29 16:55 ` Robert Haas <[email protected]>
2022-07-29 15:04 ` Tom Lane <[email protected]>
2022-07-29 15:56 ` Robert Haas <[email protected]>
2025-05-20 03:06 Re: generic plans and "initial" pruning Tom Lane <[email protected]>
2025-05-20 07:59 ` Tomas Vondra <[email protected]>
2025-05-21 10:22 ` Amit Langote <[email protected]>
2025-05-20 13:25 ` Amit Langote <[email protected]>
2025-05-20 15:38 ` Tom Lane <[email protected]>
2025-05-21 10:22 ` Amit Langote <[email protected]>
2025-05-22 08:12 ` Amit Langote <[email protected]>
2025-05-22 13:04 ` Tomas Vondra <[email protected]>
2025-05-23 02:17 ` Amit Langote <[email protected]>
2025-06-20 12:30 ` Amit Langote <[email protected]>
2025-07-17 12:11 ` Amit Langote <[email protected]>
2025-07-22 06:43 ` Amit Langote <[email protected]>
2025-11-12 14:17 ` Amit Langote <[email protected]>
2025-11-17 12:50 ` Amit Langote <[email protected]>
2025-11-20 07:30 ` Amit Langote <[email protected]>
2025-11-23 12:17 ` Tender Wang <[email protected]>
2025-11-25 01:56 ` Amit Langote <[email protected]>
2025-11-24 03:29 ` Chao Li <[email protected]>
2025-11-25 08:31 ` Amit Langote <[email protected]>
2026-03-07 09:54 ` Amit Langote <[email protected]>
2026-03-09 04:41 ` Amit Langote <[email protected]>
2026-03-19 17:20 ` Amit Langote <[email protected]>
2026-03-25 07:39 ` Amit Langote <[email protected]>
2026-03-26 09:24 ` Amit Langote <[email protected]>
2026-03-27 09:00 ` Amit Langote <[email protected]>
2026-04-04 12:10 ` Amit Langote <[email protected]>
2026-05-27 12:03 ` Thom Brown <[email protected]>
2026-05-28 08:13 ` Amit Langote <[email protected]>
2026-05-28 13:13 ` Thom Brown <[email protected]>
2026-05-29 08:56 ` Amit Langote <[email protected]>
2026-05-29 10:30 ` Thom Brown <[email protected]>
2026-06-02 17:54 ` Ilmar Yunusov <[email protected]>
2026-06-04 00:25 ` Amit Langote <[email protected]>
2025-05-22 13:50 ` Robert Haas <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox