public inbox for [email protected]
help / color / mirror / Atom feedFrom: Amit Langote <[email protected]>
To: Thom Brown <[email protected]>
Cc: Chao Li <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: Tender Wang <[email protected]>
Cc: Alexander Lakhin <[email protected]>
Cc: Tomas Vondra <[email protected]>
Cc: Robert Haas <[email protected]>
Cc: Alvaro Herrera <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Daniel Gustafsson <[email protected]>
Cc: David Rowley <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Subject: Re: generic plans and "initial" pruning
Date: Thu, 28 May 2026 17:13:53 +0900
Message-ID: <CA+HiwqGqAHhJmJn5=n9363R8UkcTpu3Uxj4Q2DmuG527ERDt8A@mail.gmail.com> (raw)
In-Reply-To: <CAA-aLv5+dSSQ7KKZPgnysnFOTEXkFKYbeqSWk5Qu61_3Vd8aJw@mail.gmail.com>
References: <CA+HiwqFpZ80UJKr4tZus4Omgg7YESzFXKSwSHRW2Ap2=XSVyUA@mail.gmail.com>
<[email protected]>
<CA+HiwqH8N-SxEB6SysEBsYNgV_KJs66k9Z2SNmqVzbBP-60yWg@mail.gmail.com>
<[email protected]>
<CA+HiwqEmG9YCQvG6uux7sO=jKFSAW6hA4Ea-ymfD+JhJAe4PWQ@mail.gmail.com>
<CA+HiwqE2FfJfH=siLiR3kJ13tmXZORAGTWsZc2r52o1_5BDv+g@mail.gmail.com>
<[email protected]>
<CA+HiwqFhkpXHAA=4NY5SqYXX08uq=nYtXcSByNZF=2MAy1UA7A@mail.gmail.com>
<CA+HiwqHCcSoYfpMjFshaU1bj6NjreiDvMSDpVSeBmqk-kbWrPw@mail.gmail.com>
<CA+HiwqHOejJk0_qMuM5g38h70hY_JvHMAKwnH3k=urfTXauPQA@mail.gmail.com>
<CA+HiwqFsGKM82oaMby3VWYXf_XFpDAMeT+6SXgj-45HpTrS1dA@mail.gmail.com>
<CA+HiwqFA5hUWYktt3VMh4zQOYMxqH-MpdX8eemfM+o-9dY-zbQ@mail.gmail.com>
<CA+HiwqEn7bbUXaXO=SmUujBjJSHfS31cwQroHRBwT0sR=66bgg@mail.gmail.com>
<CA+HiwqGGLDTd1ZTK1c0zv4La7XOVSVMqBuNtscJeh6FyUQvFvA@mail.gmail.com>
<CA+HiwqE2JFiqqrXdyJVQWY-fMGwzDkLqjXQdUKbPaCpDpxd_2g@mail.gmail.com>
<CA+HiwqFp3jZGSz==QjeuV62_62F6+V6b62=Uqvy99sW_gsgWBA@mail.gmail.com>
<CAHewXNkUz9XGG8nnoxZaw35e+5bQVVP=eeJE4cW4V2e+P9ndFw@mail.gmail.com>
<CA+HiwqFKSpfYruzcVz-5CcFxg7gMa+ycXjMa2aPz_J_P4LGXTg@mail.gmail.com>
<[email protected]>
<CA+HiwqEQ1oME-hcDXwC9rGQb=u7MdUFG3Sc=Qg27uH480v10FA@mail.gmail.com>
<[email protected]>
<CA+HiwqGXMLSQyJvynWF40yNwBAx-pXtxemReP8L+C+kaUa5v5A@mail.gmail.com>
<CA+HiwqGBfMgcxokEH_mg6s=ttLFm54dj4hT6yXydU2t0g6oQ3g@mail.gmail.com>
<CA+HiwqEEkGfMc_LSJhfz96o-czVS4B59Vhw6i1_t58ZGqhP8VA@mail.gmail.com>
<CA+HiwqHAd+9nptjxP6=KrcKA1BMsS6pbB3B2oaojwdyH_wBWCA@mail.gmail.com>
<CA+HiwqE7_YpU--EsrhvNqcZ+10+92EGFaX5609AUJb9ENLntnQ@mail.gmail.com>
<CA+HiwqEF9SgKyQ1HrYOURpv8DGRGHDNwBT9Y6yEBVCW+=kh_=w@mail.gmail.com>
<CA+HiwqFpEHBjosRackQhm6yKKnHgqm8Ewkn=qsctT1N0PqVSrg@mail.gmail.com>
<CA+HiwqGJP91Qed0EjuB72Lv4_QAiVOMYjya7GA0aas8K6NZUZA@mail.gmail.com>
<[email protected]>
<CA+HiwqE7LDSoaF024Mt9v1Gt-uE-WoT9GawC5ds45SaPczV8Qw@mail.gmail.com>
<CA+HiwqGn38DsKgMYKWZ6jyv3_oqCSB0j+XucTjNM0S+BFsQpVA@mail.gmail.com>
<CA+HiwqGFNe7kBkKZm0KtG_CFfw-ciK659SJMGP0CWVaa2q8rmw@mail.gmail.com>
<CA+HiwqELAcgVg_3Gb4VTOpC6wcNhHP0m-8OJFG0MeGRo0M=_4Q@mail.gmail.com>
<CA+HiwqHBxDL=3qQa1f-sBOBZqB88EiVAiagXF3X8Kagpr6Yhpw@mail.gmail.com>
<CA+HiwqFx0kmGqSDcLrE37KkHS2T9O1NoBitZT4mA4yJBBt_QjA@mail.gmail.com>
<CA+HiwqGq=xQvE0oCeOX_oXWq2iyNs5q9UwopyQ2uXF2kJPXTDg@mail.gmail.com>
<CA+HiwqHN9x7ufTz3EfAA3-Zq3NOTeZMKtBatmevMesybwBUaAw@mail.gmail.com>
<CA+HiwqGAT8jKSgjsfPvW2Ft=5xWCCfq05j9=jJKxP34Qqe68Pg@mail.gmail.com>
<CAA-aLv5+dSSQ7KKZPgnysnFOTEXkFKYbeqSWk5Qu61_3Vd8aJw@mail.gmail.com>
Hi Thom,
On Wed, May 27, 2026 at 9:03 PM Thom Brown <[email protected]> wrote:
>
> On Sat, 4 Apr 2026 at 13:11, Amit Langote <[email protected]> wrote:
> >
> > Attached is a redesigned version. While working on the previous
> > design, I grew increasingly uncomfortable with CachedPlanPrepData --
> > it was smuggling executor state out of GetCachedPlan() through an
> > out-parameter, which papered over the real problem: GetCachedPlan()
> > was doing too much. The main change in this version is architectural:
> > GetCachedPlan() no longer acquires execution locks. Callers now own
> > that responsibility, which is natural because each call site iterates
> > stmt_list differently and manages execution state in its own way --
> > and it lets them choose between conservative lock-all and
> > pruning-aware locking where appropriate.
> >
> > Non-portal call sites remain on the conservative path for now.
> > _SPI_execute_plan requires care around snapshot setup, which happens
> > after plan fetch rather than before. SQL functions have a different
> > issue: init_execution_state() fetches the plan while postquel_start()
> > handles execution, with execution_state containers in between, making
> > it harder to thread a prepped QueryDesc through. The portal path and
> > EXPLAIN EXECUTE cover the most common
> > prepared-statement-with-partitions workloads; the remaining sites can
> > be converted incrementally.
> >
> > This is now starting to feel closer to what Tom suggested back in
> > January 2023 [1], where he proposed getting rid of
> > AcquireExecutorLocks() inside GetCachedPlan() entirely and pushing
> > lock acquisition out to callers. He noted that "we'd be pushing the
> > responsibility for looping back and re-planning out to fairly
> > high-level calling code" and that "we'd definitely be changing some
> > fundamental APIs." That is the direction I came around to over the
> > last couple of weeks while wrestling with CachedPlanPrepData. The
> > reverted approach also tried to follow Tom's direction but moved
> > locking into ExecutorStart(), which forced it to handle plan
> > invalidation from inside the executor by mutating the CachedPlan
> > in-place. This version moves locking out to the callers instead, so
> > the executor and plan cache never reach into each other.
> >
> > The series is now four patches:
> >
> > 0001: Move execution lock acquisition out of GetCachedPlan(). Adds
> > AcquireExecutorLocks() as a caller-facing function with validity check
> > and retry. Adds PortalLockCachedPlan() in pquery.c to centralize the
> > portal retry logic. All callers are converted. No behavioral change.
> >
> > 0002: Refactor executor's initial partition pruning setup. Cleanup
> > only, no behavioral change.
> >
> > 0003: Introduce ExecutorPrep() and refactor executor startup. Factors
> > range table init, permission checks, and initial pruning out of
> > InitPlan(). Scaffolding for 0004; all callers still go through the
> > normal ExecutorStart() path.
> >
> > 0004: Use pruning-aware locking for single-statement cached plans.
> > Adds ExecutorPrepAndLock() which locks unprunable relations, runs
> > ExecutorPrep() to determine surviving partitions, then locks only
> > those. Extends PortalLockCachedPlan() with a pruning-aware path for
> > eligible plans. Multi-statement CachedPlans (from rule rewriting)
> > always use conservative locking. In principle, this could be relaxed
> > if the planner can prove that no pruning expression reads state
> > modified by an earlier statement, but that is left for a future patch.
> > Includes regression tests.
> >
> > In case it's not clear, I'm not targeting v19 at this point. I'd like
> > to get this into v20 CF1 and would welcome review from anyone
> > interested.
>
> After not having looked at this in close to 2 years, I thought I'd
> give it another look.
Thanks for taking a look.
> Not found any user-facing issues, and I'm liking
> seeing so few locks in pg_locks. I can see that with pruning disabled,
> the fallback works, pruning-aware locking is working via SPI through
> plpgsql, running ALTER between executions and also invalidating
> indexes force replans, and it's looking good.
>
> But I also think there might be a bug in patch 0001, but I'd
> appreciate checking my reasoning because I'm not fully confident I've
> been diligent enough.
>
> When PortalStart() opens a SELECT cursor that's backed by a cached
> plan, it does roughly the following. It builds a queryDesc (an
> executor-side struct), one of whose fields is a pointer into the plan
> tree inside the portal's cached plan. Then it calls
> PortalLockCachedPlan() to acquire the necessary locks, and finally
> hands the queryDesc over to the executor.
>
> My worry is about what happens if the cached plan turns out to be
> stale, for instance because someone ran DDL on a referenced table. In
> that case PortalLockCachedPlan() throws the old plan away (via
> ReleaseCachedPlan) and fetches a freshly-built replacement, updtating
> the portal's own pointers to match. But the queryDesc from earlier
> isn't touched. Its plan pointer still references the old, now-released
> plan. From what I can see, once that old plan's last reference is
> dropped its memory can be freed, which would leave the executor
> reading from freed memory in the next step.
>
> The bit I'm least sure about is whether the old plan's memory really
> does get reclaimed straight away when its refcount hits zero. If
> something keeps it alive longer then this isn't a bug, or at least not
> as bad as I'm making out. I had a look but couldn't convince myself
> either way from the code alone. To actually hit this you'd need a
> cursor on a cached plan, plus an invalidation arriving in the small
> window between the portal being set up and the cursor being opened.
> The race condition is brief, and I've not been able to hit it in
> testing.
>
> The thing that got me thinking this is real: patch 0004 modifies
> PortalLockCachedPlan() so that whenever it replans, it also rebuilds
> the queryDesc. That's pretty much the fix I'd expect for this, which
> makes me suspect somebody hit it at some point. But 0004 only applies
> that fix on the new pruning-aware code path, and it was mentioned in
> the thread that 0001 to 0003 might land before 0004. If so, master
> would carry the bug in the gap between the two.
>
> I suspect a way to deal with it would be to move the CreateQueryDesc
> call in the SELECT case to after PortalLockCachedPlan() returns, which
> is what the other portal strategies already seem to do. Alternatively,
> you could bring 0004's changes in this area into 0001 and have
> PortalLockCachedPlan() always rebuild the queryDesc when it replans.
>
> If I've got this wrong and there's some lifetime mechanism I missed
> that keeps the old plan's memory alive, then it's a non-issue and I'm
> misreading the code. If I have got it wrong, could you please add
> comments to make what is going on clearer?
It's a real bug.
You're right that if PortalLockCachedPlan() replans, the QueryDesc
created before the call still points at the old PlannedStmt from the
released plan. And yes, 0004 happens to fix it by rebuilding the
QueryDesc inside PortalLockCachedPlan(), but 0001 through 0003 are
broken on their own.
Attached is an updated set with the fix: CreateQueryDesc now runs
after PortalLockCachedPlan() returns, as you suggested. That said,
I'll probably focus first on settling the plancache refactoring that
spun off from this thread [1], and then start a new thread for the
pruning-aware locking work on top of it, incorporating parts of this
series.
--
Thanks, Amit Langote
[1] https://www.postgresql.org/message-id/CA%2BHiwqE1ntHy2h9zJ9v3MwAkoGAveSERcHWkDTTZnP0kxWqbKQ%40mail.g...
Attachments:
[application/octet-stream] v12-0001-Move-execution-lock-acquisition-out-of-GetCached.patch (16.2K, 2-v12-0001-Move-execution-lock-acquisition-out-of-GetCached.patch)
download | inline diff:
From a3214580f2ce1983a111af07ccb092ba03c812c8 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 18:38:34 +0900
Subject: [PATCH v12 1/4] Move execution lock acquisition out of
GetCachedPlan()
GetCachedPlan() previously acquired execution locks on all plan
relations as part of cached plan validation. Move this
responsibility to callers, making GetCachedPlan() return a valid
plan without holding execution locks.
Add AcquireExecutorLocks() as the caller-facing function: it locks
all relations in the plan, checks that the plan is still valid
afterward, and returns false if it was invalidated so the caller
can retry with a fresh plan.
For portal-backed callers, add PortalLockCachedPlan() in pquery.c
which wraps the lock-check-retry loop and handles the case where
replanning changes the portal strategy. Store the CachedPlanSource
pointer in PortalData so retry can call GetCachedPlan() without
the caller threading it through.
Adjust all non-portal GetCachedPlan() callers (SPI, EXPLAIN
EXECUTE, SQL functions) to call AcquireExecutorLocks() explicitly
after fetching the plan.
No behavioral change. This separates plan retrieval from execution
setup, allowing a later commit to substitute pruning-aware locking
for eligible plans.
---
src/backend/commands/portalcmds.c | 1 +
src/backend/commands/prepare.c | 14 +++++-
src/backend/executor/functions.c | 14 ++++--
src/backend/executor/spi.c | 22 +++++++--
src/backend/tcop/postgres.c | 2 +
src/backend/tcop/pquery.c | 70 ++++++++++++++++++++++++++++-
src/backend/utils/cache/plancache.c | 44 +++++++++++++-----
src/backend/utils/mmgr/portalmem.c | 7 +++
src/include/utils/plancache.h | 1 +
src/include/utils/portal.h | 3 ++
10 files changed, 157 insertions(+), 21 deletions(-)
diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index 01efac3319e..cf5deec4943 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -118,6 +118,7 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
queryString,
CMDTAG_SELECT, /* cursor's query is always a SELECT */
list_make1(plan),
+ NULL,
NULL);
/*----------
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..03d7a98fc58 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -207,6 +207,7 @@ ExecuteQuery(ParseState *pstate,
query_string,
entry->plansource->commandTag,
plan_list,
+ entry->plansource,
cplan);
/*
@@ -632,8 +633,17 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
}
/* Replan if needed, and acquire a transient refcount */
- cplan = GetCachedPlan(entry->plansource, paramLI,
- CurrentResourceOwner, pstate->p_queryEnv);
+ for (;;)
+ {
+ cplan = GetCachedPlan(entry->plansource, paramLI,
+ CurrentResourceOwner,
+ pstate->p_queryEnv);
+ plan_list = cplan->stmt_list;
+
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ }
INSTR_TIME_SET_CURRENT(planduration);
INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..2afb814a435 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -654,6 +654,7 @@ static bool
init_execution_state(SQLFunctionCachePtr fcache)
{
CachedPlanSource *plansource;
+ CachedPlan *cplan;
execution_state *preves = NULL;
execution_state *lasttages = NULL;
int nstmts;
@@ -696,10 +697,15 @@ init_execution_state(SQLFunctionCachePtr fcache)
* CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
*/
fcache->cowner = CurrentResourceOwner;
- fcache->cplan = GetCachedPlan(plansource,
- fcache->paramLI,
- fcache->cowner,
- NULL);
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, fcache->paramLI,
+ fcache->cowner, NULL);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, fcache->cowner);
+ }
+ fcache->cplan = cplan;
/*
* If necessary, make esarray[] bigger to hold the needed state.
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..268cd10bde8 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1686,6 +1686,7 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
query_string,
plansource->commandTag,
stmt_list,
+ plansource,
cplan);
/*
@@ -2106,6 +2107,16 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
_SPI_current->queryEnv);
Assert(cplan == plansource->gplan);
+ if (!AcquireExecutorLocks(cplan))
+ {
+ /* Plan invalidated during locking; get a fresh one. */
+ ReleaseCachedPlan(cplan,
+ plan->saved ? CurrentResourceOwner : NULL);
+ cplan = GetCachedPlan(plansource, NULL,
+ plan->saved ? CurrentResourceOwner : NULL,
+ _SPI_current->queryEnv);
+ }
+
/* Pop the error context stack */
error_context_stack = spierrcontext.previous;
@@ -2574,9 +2585,14 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
* Replan if needed, and increment plan refcount. If it's a saved
* plan, the refcount must be backed by the plan_owner.
*/
- cplan = GetCachedPlan(plansource, options->params,
- plan_owner, _SPI_current->queryEnv);
-
+ for (;;)
+ {
+ cplan = GetCachedPlan(plansource, options->params,
+ plan_owner, _SPI_current->queryEnv);
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, plan_owner);
+ }
stmt_list = cplan->stmt_list;
/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index dbef734a93f..2929f158338 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -1243,6 +1243,7 @@ exec_simple_query(const char *query_string)
query_string,
commandTag,
plantree_list,
+ NULL,
NULL);
/*
@@ -2042,6 +2043,7 @@ exec_bind_message(StringInfo input_message)
query_string,
psrc->commandTag,
cplan->stmt_list,
+ psrc,
cplan);
/* Portal is defined, set the plan ID based on its contents. */
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index ee731000820..4699b53cab7 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,6 +59,7 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
+static bool PortalLockCachedPlan(Portal portal);
/*
@@ -463,6 +464,8 @@ PortalStart(Portal portal, ParamListInfo params,
*/
portal->strategy = ChoosePortalStrategy(portal->stmts);
+restart:
+
/*
* Fire her up according to the strategy
*/
@@ -485,6 +488,21 @@ PortalStart(Portal portal, ParamListInfo params,
* non-default nesting level for the snapshot.
*/
+ /*
+ * If the portal is backed by a cached plan, acquire execution
+ * locks via PortalLockCachedPlan(). If the plan is
+ * invalidated during locking, it replans and may change the
+ * portal strategy, requiring us to restart PortalStart().
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ {
+ PopActiveSnapshot();
+ goto restart;
+ }
+ }
+
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
@@ -535,6 +553,11 @@ PortalStart(Portal portal, ParamListInfo params,
case PORTAL_ONE_RETURNING:
case PORTAL_ONE_MOD_WITH:
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
/*
* We don't start the executor until we are told to run the
@@ -578,7 +601,20 @@ PortalStart(Portal portal, ParamListInfo params,
break;
case PORTAL_MULTI_QUERY:
- /* Need do nothing now */
+
+ /*
+ * GetCachedPlan() no longer acquires execution locks, so we
+ * must do it here. Multi-statement plans always use
+ * conservative locking (all partitions locked); pruning-aware
+ * locking is not feasible because PortalRunMulti() executes
+ * statements sequentially with CCI between them.
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal))
+ goto restart;
+ }
+
portal->tupDesc = NULL;
break;
}
@@ -1786,3 +1822,35 @@ EnsurePortalSnapshotExists(void)
/* PushActiveSnapshotWithLevel might have copied the snapshot */
portal->portalSnapshot = GetActiveSnapshot();
}
+
+/*
+ * PortalLockCachedPlan
+ * Acquire execution locks for a cached-plan-backed portal,
+ * retrying with a fresh plan if the current one is invalidated.
+ *
+ * Returns true if replanning changed portal->strategy, meaning the
+ * caller must redispatch. Returns false once locks are held.
+ */
+static bool
+PortalLockCachedPlan(Portal portal)
+{
+ PortalStrategy start_strategy = portal->strategy;
+
+ if (AcquireExecutorLocks(portal->cplan))
+ return false;
+
+ /* Replan. Locks will be taken freshly. */
+ ReleaseCachedPlan(portal->cplan, portal->resowner);
+ portal->cplan = NULL;
+ portal->stmts = NIL;
+ portal->cplan = GetCachedPlan(portal->plansource,
+ portal->portalParams,
+ portal->resowner,
+ portal->queryEnv);
+ portal->stmts = portal->cplan->stmt_list;
+ portal->strategy = ChoosePortalStrategy(portal->stmts);
+ if (portal->strategy != start_strategy)
+ return true;
+
+ return false;
+}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 698e7c1aa22..f7fe366859c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -100,7 +100,7 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
ParamListInfo boundParams);
static double cached_plan_cost(CachedPlan *plan, bool include_planner);
static Query *QueryListGetPrimaryStmt(List *stmts);
-static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static void AcquireExecutorLocksInt(List *stmt_list, bool acquire);
static void AcquirePlannerLocks(List *stmt_list, bool acquire);
static void ScanQueryForLocks(Query *parsetree, bool acquire);
static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -945,8 +945,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
* Caller must have already called RevalidateCachedQuery to verify that the
* querytree is up to date.
*
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, the generic plan may be reused as a valid cached
+ * plan. Any execution-time setup, including lock acquisition, is the
+ * caller's responsibility.
*/
static bool
CheckCachedPlan(CachedPlanSource *plansource)
@@ -983,8 +984,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
*/
Assert(plan->refcount > 0);
- AcquireExecutorLocks(plan->stmt_list, true);
-
/*
* If plan was transient, check to see if TransactionXmin has
* advanced, and if so invalidate it.
@@ -1003,9 +1002,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
/* Successfully revalidated and locked the query. */
return true;
}
-
- /* Oops, the race case happened. Release useless locks. */
- AcquireExecutorLocks(plan->stmt_list, false);
}
/*
@@ -1282,8 +1278,11 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
* plan or a custom plan for the given parameters: the caller does not know
* which it will get.
*
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid but no execution locks are held.
+ * The caller must call AcquireExecutorLocks() before executing.
+ * For freshly built plans (custom or new generic), the planner
+ * already holds the needed locks, so AcquireExecutorLocks() is
+ * redundant but harmless.
*
* On return, the refcount of the plan has been incremented; a later
* ReleaseCachedPlan() call is expected. If "owner" is not NULL then
@@ -1906,9 +1905,11 @@ QueryListGetPrimaryStmt(List *stmts)
/*
* AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
* or release them if acquire is false.
+ *
+ * This locks all relations in a given PlannedStmt's range table.
*/
static void
-AcquireExecutorLocks(List *stmt_list, bool acquire)
+AcquireExecutorLocksInt(List *stmt_list, bool acquire)
{
ListCell *lc1;
@@ -1955,6 +1956,27 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
}
}
+/*
+ * AcquireExecutorLocks
+ * Acquire execution locks on all relations in a cached plan.
+ *
+ * Returns true if the plan is still valid after locking. Returns
+ * false if the plan was invalidated while locks were being acquired,
+ * in which case the locks have been released and the caller should
+ * discard this plan and retry with a fresh one from GetCachedPlan().
+ */
+bool
+AcquireExecutorLocks(CachedPlan *cplan)
+{
+ AcquireExecutorLocksInt(cplan->stmt_list, true);
+ if (!cplan->is_valid)
+ {
+ AcquireExecutorLocksInt(cplan->stmt_list, false);
+ return false;
+ }
+ return true;
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index 493f9b0ee19..613f3be30b3 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -272,6 +272,10 @@ CreateNewPortal(void)
* the passed plan trees have adequate lifetime. Typically this is done by
* copying them into the portal's context.
*
+ * If plansource is provided, it is the CachedPlanSource that produced
+ * cplan. PortalLockCachedPlan() uses it to fetch a fresh plan if the
+ * current one is invalidated during execution lock acquisition.
+ *
* The caller is also responsible for ensuring that the passed prepStmtName
* (if not NULL) and sourceText have adequate lifetime.
*
@@ -286,6 +290,7 @@ PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan)
{
Assert(PortalIsValid(portal));
@@ -299,6 +304,7 @@ PortalDefineQuery(Portal portal,
portal->commandTag = commandTag;
SetQueryCompletion(&portal->qc, commandTag, 0);
portal->stmts = stmts;
+ portal->plansource = plansource;
portal->cplan = cplan;
portal->status = PORTAL_DEFINED;
}
@@ -517,6 +523,7 @@ PortalDrop(Portal portal, bool isTopCommit)
/* drop cached plan reference, if any */
PortalReleaseCachedPlan(portal);
+ portal->plansource = NULL;
/*
* If portal has a snapshot protecting its data, release that. This needs
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 7a4a85c8038..e0fc403e717 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -241,6 +241,7 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
ParamListInfo boundParams,
ResourceOwner owner,
QueryEnvironment *queryEnv);
+extern bool AcquireExecutorLocks(CachedPlan *cplan);
extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
diff --git a/src/include/utils/portal.h b/src/include/utils/portal.h
index a7bedb12c18..3af535362cd 100644
--- a/src/include/utils/portal.h
+++ b/src/include/utils/portal.h
@@ -137,6 +137,8 @@ typedef struct PortalData
CommandTag commandTag; /* command tag for original query */
QueryCompletion qc; /* command completion data for executed query */
List *stmts; /* list of PlannedStmts */
+ CachedPlanSource *plansource; /* CachedPlanSource, for replanning on
+ * invalidation */
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
ParamListInfo portalParams; /* params to pass to query */
@@ -240,6 +242,7 @@ extern void PortalDefineQuery(Portal portal,
const char *sourceText,
CommandTag commandTag,
List *stmts,
+ CachedPlanSource *plansource,
CachedPlan *cplan);
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
extern void PortalCreateHoldStore(Portal portal);
--
2.47.3
[application/octet-stream] v12-0002-Refactor-executor-s-initial-partition-pruning-se.patch (7.3K, 3-v12-0002-Refactor-executor-s-initial-partition-pruning-se.patch)
download | inline diff:
From 29e5ad113f6974a94fbcf984b43fa3ed86f57632 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Wed, 25 Mar 2026 16:06:38 +0900
Subject: [PATCH v12 2/4] Refactor executor's initial partition pruning setup
Simplify handling of unpruned relids by moving responsibility
for recording them in EState into CreatePartitionPruneState(),
avoiding the need to pass all_leafpart_rtis as an out parameter.
Also move the setting of ecxt_param_exec_vals from
ExecCreatePartitionPruneState() to InitExecPartitionPruneContexts(),
to allow the former to be called before PARAM_EXEC parameters are
set up. A later commit needs this when running pruning state setup
outside of InitPlan().
No behavioral change.
---
src/backend/executor/execPartition.c | 70 +++++++++++++++++++---------
1 file changed, 48 insertions(+), 22 deletions(-)
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index d96d4f9947b..2a3af006f77 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -185,8 +185,7 @@ static char *ExecBuildSlotPartitionKeyDescription(Relation rel,
static List *adjust_partition_colnos(List *colnos, ResultRelInfo *leaf_part_rri);
static List *adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap);
static PartitionPruneState *CreatePartitionPruneState(EState *estate,
- PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis);
+ PartitionPruneInfo *pruneinfo);
static void InitPartitionPruneContext(PartitionPruneContext *context,
List *pruning_steps,
PartitionDesc partdesc,
@@ -1978,7 +1977,7 @@ adjust_partition_colnos_using_map(List *colnos, AttrMap *attrMap)
* estate->es_part_prune_infos. For each entry, it creates a PartitionPruneState
* and adds it to es_part_prune_states. ExecInitPartitionExecPruning() accesses
* these states through their corresponding indexes in es_part_prune_states and
- * assign each state to the parent node's PlanState, from where it will be used
+ * assigns each state to the parent node's PlanState, from where it will be used
* for "exec" pruning.
*
* If initial pruning steps exist for a PartitionPruneInfo entry, this function
@@ -1996,29 +1995,31 @@ ExecDoInitialPruning(EState *estate)
{
ListCell *lc;
+ Assert(estate->es_part_prune_results == NULL);
foreach(lc, estate->es_part_prune_infos)
{
PartitionPruneInfo *pruneinfo = lfirst_node(PartitionPruneInfo, lc);
PartitionPruneState *prunestate;
Bitmapset *validsubplans = NULL;
- Bitmapset *all_leafpart_rtis = NULL;
Bitmapset *validsubplan_rtis = NULL;
/* Create and save the PartitionPruneState. */
- prunestate = CreatePartitionPruneState(estate, pruneinfo,
- &all_leafpart_rtis);
+ prunestate = CreatePartitionPruneState(estate, pruneinfo);
estate->es_part_prune_states = lappend(estate->es_part_prune_states,
prunestate);
/*
* Perform initial pruning steps, if any, and save the result
- * bitmapset or NULL as described in the header comment.
+ * bitmapset or NULL as described in the header comment. RT indexes
+ * of surviving partitions would be added to validsubplan_rtis.
+ *
+ * Note that when do_initial_prune is false,
+ * CreatePartitionPruneState() would have already added the RT indexes
+ * of all leaf partitions to es_unpruned_relids directly.
*/
if (prunestate->do_initial_prune)
validsubplans = ExecFindMatchingSubPlans(prunestate, true,
&validsubplan_rtis);
- else
- validsubplan_rtis = all_leafpart_rtis;
estate->es_unpruned_relids = bms_add_members(estate->es_unpruned_relids,
validsubplan_rtis);
@@ -2136,14 +2137,12 @@ ExecInitPartitionExecPruning(PlanState *planstate,
* parent plan node's PlanState.
*
* If initial pruning steps are to be skipped (e.g., during EXPLAIN
- * (GENERIC_PLAN)), *all_leafpart_rtis will be populated with the RT indexes of
- * all leaf partitions whose scanning subnode is included in the parent plan
- * node's list of child plans. The caller must add these RT indexes to
- * estate->es_unpruned_relids.
+ * (GENERIC_PLAN)), the RT indexes of all leaf partitions whose scanning
+ * subnode is included in the parent plan node's list of child plans are
+ * added to estate->es_unpruned_relids.
*/
static PartitionPruneState *
-CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
- Bitmapset **all_leafpart_rtis)
+CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo)
{
PartitionPruneState *prunestate;
int n_part_hierarchies;
@@ -2377,8 +2376,8 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
pinfo->execparamids);
/*
- * Return all leaf partition indexes if we're skipping pruning in
- * the EXPLAIN (GENERIC_PLAN) case.
+ * Add all leaf partition indexes to es_unpruned_relids if we're
+ * skipping pruning in the EXPLAIN (GENERIC_PLAN) case.
*/
if (pinfo->initial_pruning_steps && !prunestate->do_initial_prune)
{
@@ -2390,9 +2389,28 @@ CreatePartitionPruneState(EState *estate, PartitionPruneInfo *pruneinfo,
Index rtindex = pprune->leafpart_rti_map[part_index];
if (rtindex)
- *all_leafpart_rtis = bms_add_member(*all_leafpart_rtis,
- rtindex);
+ estate->es_unpruned_relids =
+ bms_add_member(estate->es_unpruned_relids, rtindex);
+ }
+ }
+ else if (pinfo->initial_pruning_steps == NIL)
+ {
+ /*
+ * All partitions better be present in es_unpruned_relids when
+ * none are initially prunable.
+ */
+#ifdef USE_ASSERT_CHECKING
+ int part_index = -1;
+
+ while ((part_index = bms_next_member(pprune->present_parts,
+ part_index)) >= 0)
+ {
+ Index rtindex = pprune->leafpart_rti_map[part_index];
+
+ if (rtindex)
+ Assert(bms_is_member(rtindex, estate->es_unpruned_relids));
}
+#endif
}
j++;
@@ -2490,9 +2508,10 @@ InitPartitionPruneContext(PartitionPruneContext *context,
* Initialize exec pruning contexts deferred by CreatePartitionPruneState()
*
* This function finalizes exec pruning setup for a PartitionPruneState by
- * initializing contexts for pruning steps that require the parent plan's
- * PlanState. It iterates over PartitionPruningData entries and sets up the
- * necessary execution contexts for pruning during query execution.
+ * initializing contexts for pruning steps that require PARAM_EXEC parameters
+ * and the parent plan's PlanState. It iterates over PartitionPruningData
+ * entries and sets up the necessary execution contexts for pruning during
+ * query execution.
*
* Also fix the mapping of partition indexes to subplan indexes contained in
* prunestate by considering the new list of subplans that survived initial
@@ -2520,9 +2539,16 @@ InitExecPartitionPruneContexts(PartitionPruneState *prunestate,
bool fix_subplan_map = false;
Assert(prunestate->do_exec_prune);
+ Assert(prunestate->econtext);
Assert(parent_plan != NULL);
estate = parent_plan->state;
+ /*
+ * These might not be available when ExecCreatePartitionPruneState() is
+ * called.
+ */
+ prunestate->econtext->ecxt_param_exec_vals = estate->es_param_exec_vals;
+
/*
* No need to fix subplans maps if initial pruning didn't eliminate any
* subplans.
--
2.47.3
[application/octet-stream] v12-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch (8.8K, 4-v12-0003-Introduce-ExecutorPrep-and-refactor-executor-sta.patch)
download | inline diff:
From 05c92346e2bec4c8ec9a7cf45ec572c15d64481f Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Thu, 26 Mar 2026 16:08:46 +0900
Subject: [PATCH v12 3/4] Introduce ExecutorPrep and refactor executor startup
Move permission checks, range table initialization, and initial
partition pruning out of InitPlan() into a new ExecutorPrep()
helper.
ExecutorStart() invokes ExecutorPrep() when QueryDesc->estate is
NULL, keeping current behavior unchanged. If QueryDesc->estate is
already set, ExecutorStart() reuses it.
This is preparatory refactoring only. No caller outside the
executor supplies a prebuilt EState in this commit.
In assert builds, verify that the expected relation locks are held
when entering ExecutorStart().
---
src/backend/executor/README | 10 ++-
src/backend/executor/execMain.c | 152 ++++++++++++++++++++++++++------
src/include/executor/execdesc.h | 2 +-
3 files changed, 132 insertions(+), 32 deletions(-)
diff --git a/src/backend/executor/README b/src/backend/executor/README
index 54f4782f31b..890bc3d9333 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -291,11 +291,17 @@ Query Processing Control Flow
This is a sketch of control flow for full query processing:
+ ExecutorPrep
+ May be run before ExecutorStart, or implicitly from ExecutorStart
+ if not done earlier. Creates the EState in QueryDesc, performs
+ range table initialization, permission checks, and initial
+ partition pruning.
+
CreateQueryDesc
ExecutorStart
- CreateExecutorState
- creates per-query context
+ ExecutorPrep (if QueryDesc.estate is NULL)
+ creates EState and per-query context
switch to per-query context to run ExecInitNode
AfterTriggerBeginQuery
ExecInitNode --- recursively scans plan tree
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 4b30f768680..2b9397b72f3 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -57,6 +57,7 @@
#include "parser/parse_relation.h"
#include "pgstat.h"
#include "rewrite/rewriteHandler.h"
+#include "storage/lmgr.h"
#include "tcop/utility.h"
#include "utils/acl.h"
#include "utils/backend_status.h"
@@ -76,6 +77,7 @@ ExecutorEnd_hook_type ExecutorEnd_hook = NULL;
ExecutorCheckPerms_hook_type ExecutorCheckPerms_hook = NULL;
/* decls for local routines only used within this module */
+static void ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags);
static void InitPlan(QueryDesc *queryDesc, int eflags);
static void CheckValidRowMarkRel(Relation rel, RowMarkType markType);
static void ExecPostprocessPlan(EState *estate);
@@ -147,7 +149,6 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/* sanity checks: queryDesc must not be started already */
Assert(queryDesc != NULL);
- Assert(queryDesc->estate == NULL);
/* caller must ensure the query's snapshot is active */
Assert(GetActiveSnapshot() == queryDesc->snapshot);
@@ -173,9 +174,67 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
/*
* Build EState, switch into per-query memory context for startup.
- */
- estate = CreateExecutorState();
- queryDesc->estate = estate;
+ *
+ * If ExecutorPrep() ran earlier (e.g., to do initial pruning during plan
+ * validity checking), reuse its EState to avoid redoing range table setup
+ * and pruning. Otherwise, create a fresh EState as usual.
+ *
+ * In assert builds, verify that the expected locks are held. When no
+ * prep EState was provided, AcquireExecutorLocks() should have locked
+ * every relation in the plan. When one was provided, pruning-aware
+ * locking should have locked at least the unpruned relations. Both
+ * checks are skipped in parallel workers, which acquire relation locks
+ * lazily in ExecGetRangeTableRelation().
+ */
+ if (queryDesc->estate == NULL)
+ {
+#ifdef USE_ASSERT_CHECKING
+ if (!IsParallelWorker())
+ {
+ ListCell *lc;
+
+ foreach(lc, queryDesc->plannedstmt->rtable)
+ {
+ RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc);
+
+ if (rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && rte->relid != InvalidOid))
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode,
+ true));
+ }
+ }
+#endif
+ ExecutorPrep(queryDesc, CurrentResourceOwner, eflags);
+ }
+#ifdef USE_ASSERT_CHECKING
+ else
+ {
+ /*
+ * A prep EState was provided, meaning pruning-aware locking should
+ * have locked at least the unpruned relations.
+ */
+ if (!IsParallelWorker())
+ {
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(queryDesc->estate->es_unpruned_relids,
+ rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = exec_rt_fetch(rtindex, queryDesc->estate);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY &&
+ rte->relid != InvalidOid));
+ Assert(CheckRelationOidLockedByMe(rte->relid,
+ rte->rellockmode, true));
+ }
+ }
+ }
+#endif
+
+ estate = queryDesc->estate;
+ Assert(estate);
oldcontext = MemoryContextSwitchTo(estate->es_query_cxt);
@@ -274,6 +333,64 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * ExecutorPrep
+ *
+ * Build the initial executor state for queryDesc before ExecutorStart().
+ *
+ * This creates the EState and performs the subset of executor startup that
+ * does not require plan-tree initialization, allowing that work to be reused
+ * by callers that need executor state before ExecutorStart():
+ *
+ * - initialize the range table
+ * - perform permission checks
+ * - perform initial partition pruning
+ *
+ * On success, queryDesc->estate is set and can later be reused by
+ * ExecutorStart() instead of rebuilding the same state.
+ *
+ * Caller must ensure that queryDesc->snapshot is active.
+ */
+static void
+ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
+{
+ ResourceOwner oldowner;
+ EState *estate;
+ PlannedStmt *pstmt;
+
+ Assert(queryDesc != NULL);
+
+ if (queryDesc->operation == CMD_UTILITY)
+ return;
+
+ Assert(ActiveSnapshotSet());
+ Assert(GetActiveSnapshot() == queryDesc->snapshot);
+ Assert(queryDesc->estate == NULL);
+
+ pstmt = queryDesc->plannedstmt;
+
+ estate = CreateExecutorState();
+ queryDesc->estate = estate;
+
+ estate->es_plannedstmt = pstmt;
+ estate->es_part_prune_infos = pstmt->partPruneInfos;
+ estate->es_param_list_info = queryDesc->params;
+ estate->es_queryEnv = queryDesc->queryEnv;
+ estate->es_top_eflags = eflags;
+
+ ExecCheckPermissions(pstmt->rtable, pstmt->permInfos, true);
+
+ ExecInitRangeTable(estate, pstmt->rtable, pstmt->permInfos,
+ bms_copy(pstmt->unprunableRelids));
+
+ oldowner = CurrentResourceOwner;
+ CurrentResourceOwner = owner;
+
+ ExecDoInitialPruning(estate);
+
+ CurrentResourceOwner = oldowner;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
@@ -849,37 +966,14 @@ InitPlan(QueryDesc *queryDesc, int eflags)
CmdType operation = queryDesc->operation;
PlannedStmt *plannedstmt = queryDesc->plannedstmt;
Plan *plan = plannedstmt->planTree;
- List *rangeTable = plannedstmt->rtable;
EState *estate = queryDesc->estate;
PlanState *planstate;
TupleDesc tupType;
ListCell *l;
int i;
- /*
- * Do permissions checks
- */
- ExecCheckPermissions(rangeTable, plannedstmt->permInfos, true);
-
- /*
- * initialize the node's execution state
- */
- ExecInitRangeTable(estate, rangeTable, plannedstmt->permInfos,
- bms_copy(plannedstmt->unprunableRelids));
-
- estate->es_plannedstmt = plannedstmt;
- estate->es_part_prune_infos = plannedstmt->partPruneInfos;
-
- /*
- * Perform runtime "initial" pruning to identify which child subplans,
- * corresponding to the children of plan nodes that contain
- * PartitionPruneInfo such as Append, will not be executed. The results,
- * which are bitmapsets of indexes of the child subplans that will be
- * executed, are saved in es_part_prune_results. These results correspond
- * to each PartitionPruneInfo entry, and the es_part_prune_results list is
- * parallel to es_part_prune_infos.
- */
- ExecDoInitialPruning(estate);
+ /* ExecutorPrep() must have been done. */
+ Assert(queryDesc->estate);
/*
* Next, build the ExecRowMark array from the PlanRowMark(s), if any.
diff --git a/src/include/executor/execdesc.h b/src/include/executor/execdesc.h
index 37c2576e4bc..aea5ec8ea02 100644
--- a/src/include/executor/execdesc.h
+++ b/src/include/executor/execdesc.h
@@ -45,7 +45,7 @@ typedef struct QueryDesc
int query_instr_options; /* OR of InstrumentOption flags for
* query_instr */
- /* These fields are set by ExecutorStart */
+ /* These fields are set by ExecutorStart or ExecutorPrep */
TupleDesc tupDesc; /* descriptor for result tuples */
EState *estate; /* executor's query-wide state */
PlanState *planstate; /* tree of per-plan-node state */
--
2.47.3
[application/octet-stream] v12-0004-Use-pruning-aware-locking-for-single-statement-c.patch (40.7K, 5-v12-0004-Use-pruning-aware-locking-for-single-statement-c.patch)
download | inline diff:
From c68d5de848572defbb58625d915f3323245294d4 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Sat, 4 Apr 2026 20:43:14 +0900
Subject: [PATCH v12 4/4] Use pruning-aware locking for single-statement cached
plans
For single-statement reused generic plans, perform initial partition
pruning before acquiring execution locks, then lock only the
surviving partitions.
Add ExecutorPrepAndLock() which encapsulates the pruning-aware lock
sequence: lock unprunable relations, call ExecutorPrep() to run
initial pruning, then lock survivors. Plan validity is checked
after each step; ExecutorPrepCleanup() handles the case where the
plan is invalidated between prep and execution.
Extend PortalLockCachedPlan() to use the pruning-aware path for
eligible plans (single-statement reused generic, non-utility).
All other cases continue using the conservative lock-all path
from the previous commit.
Track firstResultRels in PlannerGlobal and PlannedStmt so they
are locked even if pruned, preserving ExecInitModifyTable()
assumptions about the first result relation being available.
Multi-statement CachedPlans (from rule rewriting) always use
conservative locking, since PortalRunMulti() executes statements
sequentially with CCI between them and later statements' pruning
expressions may depend on earlier ones' effects. In principle,
this could be relaxed if the planner can prove that no pruning
expression reads state modified by an earlier statement, but that
is left for a future patch.
Regression tests are included to verify:
- Only surviving partitions are locked when pruning is enabled, and
all partitions are locked when it is disabled (pg_locks inspection).
- Multiple ModifyTable nodes (via writable CTEs) handle the case where
all target partitions are pruned, exercising firstResultRels.
- Plan invalidation during pruning-aware lock setup (DDL triggered by
a pruning expression) discards the prep state and replans cleanly.
- Multi-statement CachedPlans (from rule rewriting) fall back to
locking all partitions, avoiding stale pruning results.
Note for extension authors: code that accesses partition relations
through EState must check that the RT index is a member of
es_unpruned_relids before opening the relation. Previously this
was an optimization; it is now a correctness requirement, because
pruned partitions may not be locked.
---
src/backend/commands/explain.c | 45 +++--
src/backend/commands/prepare.c | 30 ++-
src/backend/executor/execMain.c | 142 ++++++++++++++
src/backend/executor/nodeModifyTable.c | 5 +-
src/backend/optimizer/plan/planner.c | 1 +
src/backend/optimizer/plan/setrefs.c | 18 ++
src/backend/tcop/pquery.c | 76 ++++++--
src/backend/utils/cache/plancache.c | 16 ++
src/include/commands/explain.h | 3 +-
src/include/executor/executor.h | 4 +
src/include/nodes/pathnodes.h | 3 +
src/include/nodes/plannodes.h | 10 +
src/include/utils/plancache.h | 2 +
src/test/regress/expected/partition_prune.out | 184 ++++++++++++++++++
src/test/regress/expected/plancache.out | 63 ++++++
src/test/regress/sql/partition_prune.sql | 116 +++++++++++
src/test/regress/sql/plancache.sql | 52 +++++
17 files changed, 731 insertions(+), 39 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 112c17b0d64..c5254f0f920 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -377,7 +377,8 @@ standard_ExplainOneQuery(Query *query, int cursorOptions,
/* run it (if needed) and produce output */
ExplainOnePlan(plan, into, es, queryString, params, queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ NULL);
}
/*
@@ -501,7 +502,8 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
const char *queryString, ParamListInfo params,
QueryEnvironment *queryEnv, const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters)
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd)
{
DestReceiver *dest;
QueryDesc *queryDesc;
@@ -532,13 +534,6 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
*/
INSTR_TIME_SET_CURRENT(starttime);
- /*
- * Use a snapshot with an updated command ID to ensure this query sees
- * results of any previously executed queries.
- */
- PushCopiedSnapshot(GetActiveSnapshot());
- UpdateActiveSnapshotCommandId();
-
/*
* We discard the output if we have no use for it. If we're explaining
* CREATE TABLE AS, we'd better use the appropriate tuple receiver, while
@@ -554,10 +549,34 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es,
else
dest = None_Receiver;
- /* Create a QueryDesc for the query */
- queryDesc = CreateQueryDesc(plannedstmt, queryString,
- GetActiveSnapshot(), InvalidSnapshot,
- dest, params, queryEnv, instrument_option);
+ /*
+ * Create a QueryDesc for the query, or use the one provided by the
+ * caller. When reusing a prep QueryDesc, its snapshot was set at
+ * creation time; we push it as active for ExecutorStart and override the
+ * destination and instrument options, which were not known when the
+ * caller created it.
+ */
+ if (prep_qd)
+ {
+ PushActiveSnapshot(GetActiveSnapshot());
+ queryDesc = prep_qd;
+ Assert(queryDesc->dest == None_Receiver);
+ queryDesc->dest = dest;
+ queryDesc->instrument_options = instrument_option;
+ }
+ else
+ {
+ /*
+ * Use a snapshot with an updated command ID to ensure this query sees
+ * results of any previously executed queries.
+ */
+ PushCopiedSnapshot(GetActiveSnapshot());
+ UpdateActiveSnapshotCommandId();
+ queryDesc = CreateQueryDesc(plannedstmt, queryString,
+ GetActiveSnapshot(), InvalidSnapshot,
+ dest, params, queryEnv,
+ instrument_option);
+ }
/* Select execution options */
if (es->analyze)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 03d7a98fc58..3bbbc052149 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -588,6 +588,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
MemoryContextCounters mem_counters;
MemoryContext planner_ctx = NULL;
MemoryContext saved_ctx = NULL;
+ QueryDesc *prep_qd = NULL;
if (es->memory)
{
@@ -640,8 +641,31 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
pstate->p_queryEnv);
plan_list = cplan->stmt_list;
- if (AcquireExecutorLocks(cplan))
+ if (!CachedPlanCanPrep(cplan, entry->plansource))
+ {
+ if (AcquireExecutorLocks(cplan))
+ break;
+ ReleaseCachedPlan(cplan, CurrentResourceOwner);
+ continue;
+ }
+
+ prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, plan_list),
+ query_string,
+ GetActiveSnapshot(),
+ InvalidSnapshot,
+ None_Receiver, /* ExplainOnePlan will fix */
+ paramLI,
+ pstate->p_queryEnv,
+ 0 /* ExplainOnePlan will fix */ );
+ if (ExecutorPrepAndLock(prep_qd,
+ CurrentResourceOwner,
+ es->generic ? EXEC_FLAG_EXPLAIN_GENERIC : 0,
+ &cplan->is_valid))
break;
+
+ /* Try again. */
+ ExecutorPrepCleanup(prep_qd);
+ FreeQueryDesc(prep_qd);
ReleaseCachedPlan(cplan, CurrentResourceOwner);
}
@@ -664,6 +688,7 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
plan_list = cplan->stmt_list;
/* Explain each query */
+ Assert(prep_qd == NULL || list_length(plan_list) == 1);
foreach(p, plan_list)
{
PlannedStmt *pstmt = lfirst_node(PlannedStmt, p);
@@ -671,7 +696,8 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
if (pstmt->commandType != CMD_UTILITY)
ExplainOnePlan(pstmt, into, es, query_string, paramLI, pstate->p_queryEnv,
&planduration, (es->buffers ? &bufusage : NULL),
- es->memory ? &mem_counters : NULL);
+ es->memory ? &mem_counters : NULL,
+ prep_qd);
else
ExplainOneUtility(pstmt->utilityStmt, into, es, pstate, paramLI);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 2b9397b72f3..1e81377cfd8 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -333,6 +333,124 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags)
MemoryContextSwitchTo(oldcontext);
}
+/*
+ * LockRangeTableRelids
+ * Acquire or release locks on the specified relids, which reference
+ * entries in the provided range table.
+ *
+ * Helper for AcquireExecutorLocksUnpruned().
+ */
+static void
+LockRangeTableRelids(List *rtable, Bitmapset *relids, bool acquire)
+{
+ int rtindex = -1;
+
+ while ((rtindex = bms_next_member(relids, rtindex)) >= 0)
+ {
+ RangeTblEntry *rte = list_nth_node(RangeTblEntry, rtable, rtindex - 1);
+
+ Assert(rte->rtekind == RTE_RELATION ||
+ (rte->rtekind == RTE_SUBQUERY && OidIsValid(rte->relid)));
+
+ /*
+ * Acquire the appropriate type of lock on each relation OID. Note
+ * that we don't actually try to open the rel, and hence will not fail
+ * if it's been dropped entirely --- we'll just transiently acquire a
+ * non-conflicting lock.
+ */
+ if (acquire)
+ LockRelationOid(rte->relid, rte->rellockmode);
+ else
+ UnlockRelationOid(rte->relid, rte->rellockmode);
+ }
+}
+
+/*
+ * AcquireExecutorLocksPrepared
+ *
+ * Acquire or release execution locks using pruning results already computed
+ * by ExecutorPrep() and stored in queryDesc->estate.
+ *
+ * This is intended for single-statement reused generic-plan paths that
+ * choose pruning-aware locking instead of the conservative
+ * AcquireExecutorLocks() path.
+ */
+static void
+AcquireExecutorLocksPrepared(QueryDesc *queryDesc, bool acquire)
+{
+ PlannedStmt *plannedstmt = queryDesc->plannedstmt;
+ EState *estate = queryDesc->estate;
+ Bitmapset *lock_relids;
+ ListCell *lc;
+
+ Assert(queryDesc != NULL);
+ Assert(estate != NULL);
+ Assert(plannedstmt != NULL);
+ Assert(plannedstmt->commandType != CMD_UTILITY);
+
+ lock_relids = bms_difference(estate->es_unpruned_relids,
+ plannedstmt->unprunableRelids);
+
+ /*
+ * Keep the first result relation of each ModifyTable locked even if
+ * pruning removed all target partitions. ExecInitModifyTable() relies on
+ * one such relation remaining available.
+ */
+ foreach(lc, plannedstmt->firstResultRels)
+ {
+ Index rti = lfirst_int(lc);
+
+ lock_relids = bms_add_member(lock_relids, rti);
+ }
+
+ LockRangeTableRelids(plannedstmt->rtable, lock_relids, acquire);
+
+ bms_free(lock_relids);
+
+}
+
+/*
+ * ExecutorPrepAndLock
+ * Perform pruning-aware locking for a single PlannedStmt.
+ *
+ * Locks unprunable relations first, then runs ExecutorPrep() to
+ * determine which partitions survive initial pruning, then locks
+ * only those survivors. Checks *is_valid after each locking step
+ * to detect plan invalidation (e.g., from concurrent DDL or DDL
+ * triggered by a pruning expression).
+ *
+ * Returns true if the plan is still valid and all needed locks are
+ * held. Returns false if the plan was invalidated at any point, in
+ * which case all acquired locks have been released and the caller
+ * should discard the QueryDesc and retry with a fresh plan.
+ */
+bool
+ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid)
+{
+ PlannedStmt *pstmt = queryDesc->plannedstmt;
+
+ /* Lock unprunable rels before pruning can access them. */
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, true);
+ if (!*is_valid)
+ {
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ /* Run pruning and lock survivors. */
+ ExecutorPrep(queryDesc, owner, eflags);
+ AcquireExecutorLocksPrepared(queryDesc, true);
+ if (!*is_valid)
+ {
+ AcquireExecutorLocksPrepared(queryDesc, false);
+ LockRangeTableRelids(pstmt->rtable, pstmt->unprunableRelids, false);
+ return false;
+ }
+
+ return true;
+}
+
/*
* ExecutorPrep
*
@@ -391,6 +509,30 @@ ExecutorPrep(QueryDesc *queryDesc, ResourceOwner owner, int eflags)
CurrentResourceOwner = oldowner;
}
+/*
+ * ExecutorPrepCleanup
+ * Clean up an EState that was created by ExecutorPrep() but never
+ * passed to ExecutorStart(). This happens when the plan is
+ * invalidated between prep and execution, and the caller must
+ * discard the prepped state before retrying with a fresh plan.
+ *
+ * Unlike ExecutorEnd(), this does not expect a fully initialized
+ * plan state tree -- only the range table relations and the
+ * EState itself need to be freed.
+ */
+void
+ExecutorPrepCleanup(QueryDesc *queryDesc)
+{
+ EState *estate = queryDesc->estate;
+
+ if (estate == NULL)
+ return;
+
+ ExecCloseRangeTableRelations(estate);
+ FreeExecutorState(estate);
+ queryDesc->estate = NULL;
+}
+
/* ----------------------------------------------------------------
* ExecutorRun
*
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 478cb01783c..350096bfbe7 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -5133,8 +5133,8 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
* as a reference for building the ResultRelInfo of the target partition.
* In either case, it doesn't matter which result relation is kept, so we
* just keep the first one, if all others have been pruned. See also,
- * ExecDoInitialPruning(), which ensures that this first result relation
- * has been locked.
+ * AcquireExecutorLocksUnpruned(), which ensures that this first result
+ * relation has been locked.
*/
i = 0;
foreach(l, node->resultRelations)
@@ -5148,6 +5148,7 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
/* all result relations pruned; keep the first one */
keep_rel = true;
rti = linitial_int(node->resultRelations);
+ Assert(list_member_int(estate->es_plannedstmt->firstResultRels, rti));
i = 0;
}
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index f4689e7c9f8..4cddac7f2fc 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -675,6 +675,7 @@ standard_planner(Query *parse, const char *query_string, int cursorOptions,
glob->prunableRelids);
result->permInfos = glob->finalrteperminfos;
result->subrtinfos = glob->subrtinfos;
+ result->firstResultRels = glob->firstResultRels;
result->appendRelations = glob->appendRelations;
result->subplans = glob->subplans;
result->rewindPlanIDs = glob->rewindPlanIDs;
diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c
index ff0e875f2a2..6ee51f06920 100644
--- a/src/backend/optimizer/plan/setrefs.c
+++ b/src/backend/optimizer/plan/setrefs.c
@@ -384,6 +384,24 @@ set_plan_references(PlannerInfo *root, Plan *plan)
}
}
+ /*
+ * Record the first result relation if it belongs to the set of initially
+ * prunable relations. We use bms_next_member() to get the
+ * lowest-numbered leaf result rel, which matches
+ * linitial_int(ModifyTable.resultRelations) because partition expansion
+ * preserves RT index order. ExecInitModifyTable() asserts that the
+ * recorded index matches what it actually needs.
+ */
+ if (root->leaf_result_relids)
+ {
+ Index firstResultRel = bms_next_member(root->leaf_result_relids, -1);
+
+ firstResultRel += rtoffset;
+ if (bms_is_member(firstResultRel, root->glob->prunableRelids))
+ root->glob->firstResultRels =
+ lappend_int(root->glob->firstResultRels, firstResultRel);
+ }
+
return result;
}
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index 4699b53cab7..53c50ab0fce 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -59,7 +59,9 @@ static uint64 DoPortalRunFetch(Portal portal,
long count,
DestReceiver *dest);
static void DoPortalRewind(Portal portal);
-static bool PortalLockCachedPlan(Portal portal);
+static bool PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **queryDesc_p);
/*
@@ -488,21 +490,6 @@ restart:
* non-default nesting level for the snapshot.
*/
- /*
- * If the portal is backed by a cached plan, acquire execution
- * locks via PortalLockCachedPlan(). If the plan is
- * invalidated during locking, it replans and may change the
- * portal strategy, requiring us to restart PortalStart().
- */
- if (portal->cplan)
- {
- if (PortalLockCachedPlan(portal))
- {
- PopActiveSnapshot();
- goto restart;
- }
- }
-
/*
* Create QueryDesc in portal's context; for the moment, set
* the destination to DestNone.
@@ -516,6 +503,26 @@ restart:
portal->queryEnv,
0);
+ /*
+ * If the portal is backed by a cached plan, acquire execution
+ * locks via PortalLockCachedPlan(). For eligible plans
+ * (single-statement reused generic), this performs
+ * pruning-aware locking: it runs ExecutorPrep() on the
+ * QueryDesc to determine which partitions survive initial
+ * pruning, then locks only those. If the plan is invalidated
+ * during this process, it replans and rebuilds the QueryDesc.
+ * If replanning changes the portal strategy, we must restart
+ * PortalStart() to redispatch.
+ */
+ if (portal->cplan)
+ {
+ if (PortalLockCachedPlan(portal, true, params, &queryDesc))
+ {
+ PopActiveSnapshot();
+ goto restart;
+ }
+ }
+
/*
* If it's a scrollable cursor, executor needs to support
* REWIND and backwards scan, as well as whatever the caller
@@ -555,7 +562,7 @@ restart:
case PORTAL_ONE_MOD_WITH:
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -611,7 +618,7 @@ restart:
*/
if (portal->cplan)
{
- if (PortalLockCachedPlan(portal))
+ if (PortalLockCachedPlan(portal, false, NULL, NULL))
goto restart;
}
@@ -1828,15 +1835,32 @@ EnsurePortalSnapshotExists(void)
* Acquire execution locks for a cached-plan-backed portal,
* retrying with a fresh plan if the current one is invalidated.
*
+ * If do_prep is true and the plan is eligible (single-statement reused
+ * generic plan), performs pruning-aware locking via ExecutorPrep() and
+ * populates portal->queryDesc with the prepped QueryDesc. Otherwise
+ * falls back to locking all relations in the plan.
+ *
* Returns true if replanning changed portal->strategy, meaning the
- * caller must redispatch. Returns false once locks are held.
+ * caller must redispatch. Returns false once locks are held and the
+ * plan is valid for execution.
*/
static bool
-PortalLockCachedPlan(Portal portal)
+PortalLockCachedPlan(Portal portal, bool do_prep,
+ ParamListInfo params,
+ QueryDesc **prep_qd)
{
PortalStrategy start_strategy = portal->strategy;
- if (AcquireExecutorLocks(portal->cplan))
+ if (do_prep && CachedPlanCanPrep(portal->cplan, portal->plansource))
+ {
+ Assert(prep_qd);
+ if (ExecutorPrepAndLock(*prep_qd, portal->resowner, 0,
+ &portal->cplan->is_valid))
+ return false;
+ ExecutorPrepCleanup(*prep_qd);
+ FreeQueryDesc(*prep_qd);
+ }
+ else if (AcquireExecutorLocks(portal->cplan))
return false;
/* Replan. Locks will be taken freshly. */
@@ -1852,5 +1876,15 @@ PortalLockCachedPlan(Portal portal)
if (portal->strategy != start_strategy)
return true;
+ if (prep_qd)
+ {
+ Assert(list_length(portal->stmts) == 1);
+ *prep_qd = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
+ portal->sourceText,
+ GetActiveSnapshot(), InvalidSnapshot,
+ None_Receiver, params,
+ portal->queryEnv, 0);
+ }
+
return false;
}
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index f7fe366859c..fca2f84081e 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -1977,6 +1977,22 @@ AcquireExecutorLocks(CachedPlan *cplan)
return true;
}
+/*
+ * CachedPlanCanPrep
+ * Check whether a cached plan is eligible for pruning-aware locking
+ * via ExecutorPrepAndLock().
+ *
+ * Only single-statement reused generic plans with a non-utility command
+ * qualify.
+ */
+bool
+CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource)
+{
+ return (cplan == plansource->gplan &&
+ list_length(cplan->stmt_list) == 1 &&
+ linitial_node(PlannedStmt, cplan->stmt_list)->commandType != CMD_UTILITY);
+}
+
/*
* AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
* or release them if acquire is false.
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 472e141bba3..3a03355e6b6 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -69,7 +69,8 @@ extern void ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into,
ParamListInfo params, QueryEnvironment *queryEnv,
const instr_time *planduration,
const BufferUsage *bufusage,
- const MemoryContextCounters *mem_counters);
+ const MemoryContextCounters *mem_counters,
+ QueryDesc *prep_qd);
extern void ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc);
extern void ExplainPrintTriggers(ExplainState *es,
diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h
index 33bbdbfeffb..093be9bd24b 100644
--- a/src/include/executor/executor.h
+++ b/src/include/executor/executor.h
@@ -21,6 +21,7 @@
#include "nodes/lockoptions.h"
#include "nodes/parsenodes.h"
#include "utils/memutils.h"
+#include "utils/resowner.h"
/*
@@ -235,6 +236,9 @@ ExecGetJunkAttribute(TupleTableSlot *slot, AttrNumber attno, bool *isNull)
*/
extern void ExecutorStart(QueryDesc *queryDesc, int eflags);
extern void standard_ExecutorStart(QueryDesc *queryDesc, int eflags);
+extern bool ExecutorPrepAndLock(QueryDesc *queryDesc, ResourceOwner owner,
+ int eflags, bool *is_valid);
+extern void ExecutorPrepCleanup(QueryDesc *queryDesc);
extern void ExecutorRun(QueryDesc *queryDesc,
ScanDirection direction, uint64 count);
extern void standard_ExecutorRun(QueryDesc *queryDesc,
diff --git a/src/include/nodes/pathnodes.h b/src/include/nodes/pathnodes.h
index 27a2c6815b7..a5d00633b4b 100644
--- a/src/include/nodes/pathnodes.h
+++ b/src/include/nodes/pathnodes.h
@@ -217,6 +217,9 @@ typedef struct PlannerGlobal
/* "flat" list of integer RT indexes */
List *resultRelations;
+ /* "flat" list of integer RT indexes (one per ModifyTable node) */
+ List *firstResultRels;
+
/* "flat" list of AppendRelInfos */
List *appendRelations;
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 14a1dfed2b9..7f6f7cda781 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -120,6 +120,16 @@ typedef struct PlannedStmt
/* RT indexes of relations targeted by INSERT/UPDATE/DELETE/MERGE */
Bitmapset *resultRelationRelids;
+ /*
+ * rtable indexes of first target relation in each ModifyTable node in the
+ * plan for INSERT/UPDATE/DELETE/MERGE. NIL if resultRelations is NIL.
+ *
+ * These are used by AcquireExecutorLocksUnpruned() to ensure that the
+ * first result rel for each ModifyTable remains locked even if pruned;
+ * see ExecInitModifyTable() for the executor side assumptions.
+ */
+ List *firstResultRels;
+
/* list of AppendRelInfo nodes */
List *appendRelations;
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index e0fc403e717..2941d3a301b 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -254,4 +254,6 @@ extern bool CachedPlanIsSimplyValid(CachedPlanSource *plansource,
extern CachedExpression *GetCachedExpression(Node *expr);
extern void FreeCachedExpression(CachedExpression *cexpr);
+extern bool CachedPlanCanPrep(CachedPlan *cplan, CachedPlanSource *plansource);
+
#endif /* PLANCACHE_H */
diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out
index 849049f9c51..ec73866486e 100644
--- a/src/test/regress/expected/partition_prune.out
+++ b/src/test/regress/expected/partition_prune.out
@@ -4956,3 +4956,187 @@ select * from (select a, b from phv_boolpart) t
(2 rows)
drop table phv_boolpart;
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(4 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+(1 row)
+
+commit;
+deallocate prunelock_q;
+-- Turn pruning off
+set enable_partition_pruning to off;
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+ QUERY PLAN
+----------------------------------------------
+ Append
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p2 prunelock_p_2
+ Filter: (a = $1)
+ -> Seq Scan on prunelock_p3 prunelock_p_3
+ Filter: (a = $1)
+(7 rows)
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+ a
+---
+(0 rows)
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+ relname
+--------------
+ prunelock_p1
+ prunelock_p2
+ prunelock_p3
+(3 rows)
+
+commit;
+deallocate prunelock_q;
+reset enable_partition_pruning;
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ Update on prunelock_p1 prunelock_p_1
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_3
+ Update on prunelock_p1 prunelock_p_4
+ Update on prunelock_p2 prunelock_p_5
+ Update on prunelock_p3 prunelock_p_6
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_4
+ -> Seq Scan on prunelock_p2 prunelock_p_5
+ -> Seq Scan on prunelock_p3 prunelock_p_6
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_7
+ Update on prunelock_p2 prunelock_p_8
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p2 prunelock_p_8
+ Filter: (a = $2)
+ -> Append
+ Subplans Removed: 2
+ -> Seq Scan on prunelock_p1 prunelock_p_1
+ Filter: (a = $1)
+(22 rows)
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+ QUERY PLAN
+------------------------------------------------------------
+ Update on prunelock_p
+ CTE upd1
+ -> Update on prunelock_p prunelock_p_2
+ Update on prunelock_p1 prunelock_p_3
+ Update on prunelock_p2 prunelock_p_4
+ Update on prunelock_p3 prunelock_p_5
+ -> Append
+ -> Seq Scan on prunelock_p1 prunelock_p_3
+ -> Seq Scan on prunelock_p2 prunelock_p_4
+ -> Seq Scan on prunelock_p3 prunelock_p_5
+ CTE upd2
+ -> Update on prunelock_p prunelock_p_6
+ -> Append
+ Subplans Removed: 3
+ -> Append
+ Subplans Removed: 3
+(16 rows)
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+ a | b
+---+---
+ 1 | 0
+ 2 | 1
+(2 rows)
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/expected/plancache.out b/src/test/regress/expected/plancache.out
index d58534ca1cd..54077294dce 100644
--- a/src/test/regress/expected/plancache.out
+++ b/src/test/regress/expected/plancache.out
@@ -402,3 +402,66 @@ select name, generic_plans, custom_plans from pg_prepared_statements
(1 row)
drop table test_mode;
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+NOTICE: creating index on partition inval_during_pruning_p1
+ QUERY PLAN
+---------------------------------------------------------------------------
+ Append
+ Subplans Removed: 1
+ -> Seq Scan on public.inval_during_pruning_p1 inval_during_pruning_p_1
+ Output: inval_during_pruning_p_1.a
+ Filter: (inval_during_pruning_p_1.a = stable_pruning_val())
+(5 rows)
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/partition_prune.sql b/src/test/regress/sql/partition_prune.sql
index 359a9208056..a98844d14f8 100644
--- a/src/test/regress/sql/partition_prune.sql
+++ b/src/test/regress/sql/partition_prune.sql
@@ -1518,3 +1518,119 @@ select * from (select a, b from phv_boolpart) t
group by grouping sets (a, b);
drop table phv_boolpart;
+
+--
+-- Verify that pruning-aware locking skips pruned partitions
+-- when reusing a generic cached plan.
+--
+set plan_cache_mode to force_generic_plan;
+
+create table prunelock_p (a int) partition by list (a);
+create table prunelock_p1 partition of prunelock_p for values in (1);
+create table prunelock_p2 partition of prunelock_p for values in (2);
+create table prunelock_p3 partition of prunelock_p for values in (3);
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+
+-- Turn pruning off
+set enable_partition_pruning to off;
+
+prepare prunelock_q (int) as select * from prunelock_p where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_q(1);
+
+-- Execute and check which child partitions are locked
+begin;
+execute prunelock_q(1);
+
+select c.relname
+ from pg_locks l
+ join pg_class c on c.oid = l.relation
+ where l.pid = pg_backend_pid()
+ and c.relname like 'prunelock_p_'
+ order by c.relname;
+commit;
+
+deallocate prunelock_q;
+reset enable_partition_pruning;
+
+--
+-- Verify firstResultRels handling with multiple ModifyTable nodes
+-- (writable CTEs) targeting a partitioned table. When a pruning
+-- parameter matches no partition, all result relations are pruned
+-- and the executor must still find a usable first result relation
+-- for each ModifyTable node.
+--
+prepare prunelock_mt_q (int, int) as
+ with upd1 as (update prunelock_p set a = a),
+ upd2 as (update prunelock_p set a = a where a = $2)
+ update prunelock_p set a = a where a = $1;
+
+-- Force generic plan creation
+explain (costs off) execute prunelock_mt_q(1, 2);
+
+-- All partitions pruned: value 4 matches no partition, so each
+-- ModifyTable must still initialize correctly with no matching
+-- result relations.
+explain (costs off) execute prunelock_mt_q(4, 5);
+
+deallocate prunelock_mt_q;
+drop table prunelock_p;
+
+--
+-- Verify that pruning-aware locking falls back to locking all
+-- partitions for multi-statement CachedPlans. Rule rewriting can
+-- expand a single statement into multiple PlannedStmts, and later
+-- statements must not have their pruning evaluated before earlier
+-- ones have executed, since CCI between statements can change what
+-- pruning expressions see.
+--
+create table prune_config (val int);
+insert into prune_config values (1);
+
+create table multistmt_pt (a int, b int) partition by list (a);
+create table multistmt_pt_1 partition of multistmt_pt for values in (1);
+create table multistmt_pt_2 partition of multistmt_pt for values in (2);
+insert into multistmt_pt values (1, 0), (2, 0);
+
+create function get_prune_val() returns int as $$
+ select val from prune_config;
+$$ language sql stable;
+
+create rule config_upd_rule as on update to multistmt_pt
+ do also update prune_config set val = 2;
+
+set plan_cache_mode to force_generic_plan;
+prepare multi_q as update multistmt_pt set b = b + 1 where a = get_prune_val();
+-- first execute creates the generic plan
+execute multi_q;
+-- reset for the real test
+update prune_config set val = 1;
+update multistmt_pt set b = 0;
+-- second execute reuses the plan; pruning-aware locking kicks in
+execute multi_q;
+select * from multistmt_pt order by a;
+
+deallocate multi_q;
+drop rule config_upd_rule on multistmt_pt;
+drop function get_prune_val;
+drop table multistmt_pt, prune_config;
+reset plan_cache_mode;
diff --git a/src/test/regress/sql/plancache.sql b/src/test/regress/sql/plancache.sql
index aed388d03a1..90b6c5f82bf 100644
--- a/src/test/regress/sql/plancache.sql
+++ b/src/test/regress/sql/plancache.sql
@@ -228,3 +228,55 @@ select name, generic_plans, custom_plans from pg_prepared_statements
where name = 'test_mode_pp';
drop table test_mode;
+
+-- This exercises the CachedPlanPrepCleanup() path, which must free
+-- the EState created by ExecutorPrep() when the plan is invalidated
+-- before execution begins. The pruning expression uses a stable SQL
+-- function that calls a volatile plpgsql function. That function
+-- performs DDL on a partition when a separate "signal" table says to
+-- do so. The second EXECUTE should replan cleanly after the DDL.
+set plan_cache_mode to force_generic_plan;
+create table inval_during_pruning_p (a int) partition by list (a);
+create table inval_during_pruning_p1 partition of inval_during_pruning_p for values in (1);
+create table inval_during_pruning_p2 partition of inval_during_pruning_p for values in (2);
+insert into inval_during_pruning_p values (1), (2);
+
+create table inval_during_pruning_signal (create_idx bool not null);
+insert into inval_during_pruning_signal values (false);
+create or replace function invalidate_plancache_func() returns int
+as $$
+declare
+ create_index bool;
+begin
+ -- Perform DDL on a partition if asked to
+ select create_idx into create_index from inval_during_pruning_signal for update;
+ if create_index = true then
+ raise notice 'creating index on partition inval_during_pruning_p1';
+ create index on inval_during_pruning_p1 (a);
+ update inval_during_pruning_signal set create_idx = false;
+ end if;
+ -- value that pruning will match against partition bounds
+ return 1;
+end;
+$$ language plpgsql volatile;
+
+create or replace function stable_pruning_val() returns int as $$
+ select invalidate_plancache_func();
+$$ language sql stable;
+
+prepare inval_during_pruning_q as select * from inval_during_pruning_p where a = stable_pruning_val();
+
+-- Build a generic plan and run pruning once, but don't set the signal
+-- for invalidate_plancache_func() to perform the DDL.
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+-- Reuse the generic plan. Make invalidate_plancache_func() perform DDL
+-- during this execution, which should force replanning without errors.
+update inval_during_pruning_signal set create_idx = true;
+explain (verbose, costs off) execute inval_during_pruning_q;
+
+deallocate inval_during_pruning_q;
+drop table inval_during_pruning_p, inval_during_pruning_signal;
+drop function invalidate_plancache_func, stable_pruning_val;
+
+reset plan_cache_mode;
--
2.47.3
view thread (114+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: generic plans and "initial" pruning
In-Reply-To: <CA+HiwqGqAHhJmJn5=n9363R8UkcTpu3Uxj4Q2DmuG527ERDt8A@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox