public inbox for [email protected]  
help / color / mirror / Atom feed
From: Amit Langote <[email protected]>
To: PostgreSQL-development <[email protected]>
Subject: GetCachedPlan() refactor: move execution lock acquisition out
Date: Tue, 14 Apr 2026 17:19:44 +0900
Message-ID: <CA+HiwqE1ntHy2h9zJ9v3MwAkoGAveSERcHWkDTTZnP0kxWqbKQ@mail.gmail.com> (raw)

Over in the "generic plans and initial pruning" thread [1], I came to
think that the cleanest way to address the architectural issue behind
that thread is to make GetCachedPlan() do less execution-related work.
Specifically, it should stop acquiring execution locks itself, and
leave that decision to callers.

GetCachedPlan() has long combined plan acquisition with execution
setup, and that coupling makes it harder to improve either side
cleanly. Tom pointed in this direction back in 2023 [2]:

"...the whole problem arises because the system is wrongly factored.
We should get rid of AcquireExecutorLocks altogether, allowing the
plancache to hand back a generic plan that it's not certain of the
validity of, and instead integrate the responsibility for acquiring
locks into executor startup."

This patch takes the first half of that suggestion by moving execution
locking out of the plancache and into its callers.

The attached patch does that, with no intended behavioral change:

* Remove AcquireExecutorLocks() from CheckCachedPlan(), so
GetCachedPlan() returns a plan without holding execution locks.

* Add a new internal helper in plancache.c, LockCachedPlan(), which
acquires execution locks, rechecks validity, and returns false if the
plan was invalidated, so the caller can retry.

* Add GetCachedPlanLocked() as a convenience wrapper that fetches a
plan, calls LockCachedPlan(), and retries on invalidation. This
becomes the normal entry point for callers that want a plan ready for
execution.

* Convert existing GetCachedPlan() callers to use GetCachedPlanLocked().

The second half of Tom's suggestion, moving that responsibility into
executor startup, is trickier. ExecutorStart() is a stable API with
extension hooks, so changing its contract has a fairly high bar. I
tried that route once already [3].

A working variant that adds an ExecutorPrep() entry point, with a
wrapper implementing pruning-aware locking on top, was discussed in
the original thread [1].  But rather than propose that whole stack at
once, I think it is better to do this in phases: first decouple plan
acquisition from execution locking, then revisit the ExecutorPrep()
piece separately if this goes in.

The eventual goal is still pruning-aware locking. I'm posting this
piece separately because the original thread has become fairly long,
and this part seems easier to review on its own.

[1] https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
[2] https://postgr.es/m/4191508.1674157166%40sss.pgh.pa.us
[3] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1722d5eb05d8e5d2e064cd1798abcae4f296c...

--
Thanks, Amit Langote


Attachments:

  [application/octet-stream] v1-0001-Move-execution-lock-acquisition-out-of-GetCachedP.patch (10.7K, 2-v1-0001-Move-execution-lock-acquisition-out-of-GetCachedP.patch)
  download | inline diff:
From 5707788ed4ccec870a128cf2c960b7cd6cd72249 Mon Sep 17 00:00:00 2001
From: Amit Langote <[email protected]>
Date: Tue, 14 Apr 2026 16:37:22 +0900
Subject: [PATCH v1] Move execution lock acquisition out of GetCachedPlan()

GetCachedPlan() previously acquired executor locks as part of
checking whether a cached plan could be used. Split those concerns
so that GetCachedPlan() no longer acquires execution locks.

Add an internal helper, LockCachedPlan(), which acquires
execution locks on all plan relations and returns false if the
plan is invalidated while locking, allowing the caller to retry.

Add GetCachedPlanLocked() as a convenience wrapper that fetches a
plan, locks it using the new helper, and retries on invalidation.
Convert all current callers to use this wrapper.

No intended behavioral change for current callers. They still
acquire execution locks on all plan relations as before. This
refactoring makes it possible to add more selective locking in
follow-on work.

Discussion: https://postgr.es/m/
---
 src/backend/commands/prepare.c      |  7 +--
 src/backend/executor/functions.c    |  6 +--
 src/backend/executor/spi.c          | 14 ++---
 src/backend/tcop/postgres.c         |  2 +-
 src/backend/utils/cache/plancache.c | 80 +++++++++++++++++++++++++----
 src/include/utils/plancache.h       |  4 ++
 6 files changed, 88 insertions(+), 25 deletions(-)

diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 876aad2100a..467911ea980 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -195,7 +195,7 @@ ExecuteQuery(ParseState *pstate,
 									   entry->plansource->query_string);
 
 	/* Replan if needed, and increment plan refcount for portal */
-	cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL);
+	cplan = GetCachedPlanLocked(entry->plansource, paramLI, NULL, NULL);
 	plan_list = cplan->stmt_list;
 
 	/*
@@ -632,8 +632,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es,
 	}
 
 	/* Replan if needed, and acquire a transient refcount */
-	cplan = GetCachedPlan(entry->plansource, paramLI,
-						  CurrentResourceOwner, pstate->p_queryEnv);
+	cplan = GetCachedPlanLocked(entry->plansource, paramLI,
+								CurrentResourceOwner,
+								pstate->p_queryEnv);
 
 	INSTR_TIME_SET_CURRENT(planduration);
 	INSTR_TIME_SUBTRACT(planduration, planstart);
diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index 88109348817..4ff5f130457 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -696,10 +696,8 @@ init_execution_state(SQLFunctionCachePtr fcache)
 	 * CurrentResourceOwner will be the same when ShutdownSQLFunction runs.)
 	 */
 	fcache->cowner = CurrentResourceOwner;
-	fcache->cplan = GetCachedPlan(plansource,
-								  fcache->paramLI,
-								  fcache->cowner,
-								  NULL);
+	fcache->cplan = GetCachedPlanLocked(plansource, fcache->paramLI,
+										fcache->cowner, NULL);
 
 	/*
 	 * If necessary, make esarray[] bigger to hold the needed state.
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index 52f3b11301c..58238dd3f34 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -1660,7 +1660,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan,
 	 */
 
 	/* Replan if needed, and increment plan refcount for portal */
-	cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv);
+	cplan = GetCachedPlanLocked(plansource, paramLI, NULL,
+								_SPI_current->queryEnv);
 	stmt_list = cplan->stmt_list;
 
 	if (!plan->saved)
@@ -2101,9 +2102,9 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan)
 	error_context_stack = &spierrcontext;
 
 	/* Get the generic plan for the query */
-	cplan = GetCachedPlan(plansource, NULL,
-						  plan->saved ? CurrentResourceOwner : NULL,
-						  _SPI_current->queryEnv);
+	cplan = GetCachedPlanLocked(plansource, NULL,
+								plan->saved ? CurrentResourceOwner : NULL,
+								_SPI_current->queryEnv);
 	Assert(cplan == plansource->gplan);
 
 	/* Pop the error context stack */
@@ -2574,9 +2575,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options,
 		 * Replan if needed, and increment plan refcount.  If it's a saved
 		 * plan, the refcount must be backed by the plan_owner.
 		 */
-		cplan = GetCachedPlan(plansource, options->params,
-							  plan_owner, _SPI_current->queryEnv);
-
+		cplan = GetCachedPlanLocked(plansource, options->params,
+									plan_owner, _SPI_current->queryEnv);
 		stmt_list = cplan->stmt_list;
 
 		/*
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index aeaf1c6db8f..3782e5197d4 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2028,7 +2028,7 @@ exec_bind_message(StringInfo input_message)
 	 * will be generated in MessageContext.  The plan refcount will be
 	 * assigned to the Portal, so it will be released at portal destruction.
 	 */
-	cplan = GetCachedPlan(psrc, params, NULL, NULL);
+	cplan = GetCachedPlanLocked(psrc, params, NULL, NULL);
 
 	/*
 	 * Now we can define the portal.
diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c
index 698e7c1aa22..10da50f0d9c 100644
--- a/src/backend/utils/cache/plancache.c
+++ b/src/backend/utils/cache/plancache.c
@@ -101,6 +101,7 @@ static bool choose_custom_plan(CachedPlanSource *plansource,
 static double cached_plan_cost(CachedPlan *plan, bool include_planner);
 static Query *QueryListGetPrimaryStmt(List *stmts);
 static void AcquireExecutorLocks(List *stmt_list, bool acquire);
+static bool LockCachedPlan(CachedPlan *cplan);
 static void AcquirePlannerLocks(List *stmt_list, bool acquire);
 static void ScanQueryForLocks(Query *parsetree, bool acquire);
 static bool ScanQueryWalker(Node *node, bool *acquire);
@@ -945,8 +946,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource,
  * Caller must have already called RevalidateCachedQuery to verify that the
  * querytree is up to date.
  *
- * On a "true" return, we have acquired the locks needed to run the plan.
- * (We must do this for the "true" result to be race-condition-free.)
+ * On a "true" return, the generic plan may be reused as a valid cached
+ * plan.  Any execution-time setup, including lock acquisition, is the
+ * caller's responsibility.
  */
 static bool
 CheckCachedPlan(CachedPlanSource *plansource)
@@ -983,8 +985,6 @@ CheckCachedPlan(CachedPlanSource *plansource)
 		 */
 		Assert(plan->refcount > 0);
 
-		AcquireExecutorLocks(plan->stmt_list, true);
-
 		/*
 		 * If plan was transient, check to see if TransactionXmin has
 		 * advanced, and if so invalidate it.
@@ -1000,12 +1000,9 @@ CheckCachedPlan(CachedPlanSource *plansource)
 		 */
 		if (plan->is_valid)
 		{
-			/* Successfully revalidated and locked the query. */
+			/* Successfully revalidated the query. */
 			return true;
 		}
-
-		/* Oops, the race case happened.  Release useless locks. */
-		AcquireExecutorLocks(plan->stmt_list, false);
 	}
 
 	/*
@@ -1282,8 +1279,10 @@ cached_plan_cost(CachedPlan *plan, bool include_planner)
  * plan or a custom plan for the given parameters: the caller does not know
  * which it will get.
  *
- * On return, the plan is valid and we have sufficient locks to begin
- * execution.
+ * On return, the plan is valid but execution locks are not held.  The
+ * caller is responsible for acquiring them before execution.  Most
+ * callers should use GetCachedPlanLocked() instead, which handles
+ * locking and retry-on-invalidation.
  *
  * On return, the refcount of the plan has been incremented; a later
  * ReleaseCachedPlan() call is expected.  If "owner" is not NULL then
@@ -1413,6 +1412,42 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams,
 	return plan;
 }
 
+/*
+ * GetCachedPlanLocked: fetch a cached plan ready for execution.
+ *
+ * This is a convenience wrapper around GetCachedPlan() that also acquires
+ * execution locks.  If the plan is invalidated while locks are being
+ * acquired, it releases the plan and retries until a valid, locked plan is
+ * obtained.
+ *
+ * LockCachedPlan() is called unconditionally, even for freshly built plans
+ * where the planner already holds the needed locks.  The redundant
+ * LockRelationOid() calls are cheap: a hash lookup and local refcount bump
+ * per relation.
+ *
+ * This is the normal entry point for callers that need a plan ready to
+ * execute.  Callers that need custom locking behavior should call
+ * GetCachedPlan() directly and implement their own locking.
+ */
+CachedPlan *
+GetCachedPlanLocked(CachedPlanSource *plansource, ParamListInfo boundParams,
+					ResourceOwner owner, QueryEnvironment *queryEnv)
+{
+	CachedPlan *cplan = NULL;
+
+	for (;;)
+	{
+		cplan = GetCachedPlan(plansource, boundParams, owner, queryEnv);
+		if (LockCachedPlan(cplan))
+			break;
+
+		ReleaseCachedPlan(cplan, owner);
+	}
+
+	Assert(cplan);
+	return cplan;
+}
+
 /*
  * ReleaseCachedPlan: release active use of a cached plan.
  *
@@ -1906,6 +1941,9 @@ QueryListGetPrimaryStmt(List *stmts)
 /*
  * AcquireExecutorLocks: acquire locks needed for execution of a cached plan;
  * or release them if acquire is false.
+ *
+ * Acquire or release execution locks on all relations referenced by the
+ * PlannedStmts in stmt_list.
  */
 static void
 AcquireExecutorLocks(List *stmt_list, bool acquire)
@@ -1955,6 +1993,28 @@ AcquireExecutorLocks(List *stmt_list, bool acquire)
 	}
 }
 
+/*
+ * LockCachedPlan
+ *		Acquire execution locks on all relations referenced by a cached
+ *		plan.
+ *
+ * Returns true if the plan is still valid after locking.  Returns false if
+ * the plan was invalidated while locks were being acquired, in which case
+ * any locks acquired by this function have been released and the caller
+ * should discard this plan and retry with a fresh one from GetCachedPlan().
+ */
+static bool
+LockCachedPlan(CachedPlan *cplan)
+{
+	AcquireExecutorLocks(cplan->stmt_list, true);
+	if (!cplan->is_valid)
+	{
+		AcquireExecutorLocks(cplan->stmt_list, false);
+		return false;
+	}
+	return true;
+}
+
 /*
  * AcquirePlannerLocks: acquire locks needed for planning of a querytree list;
  * or release them if acquire is false.
diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h
index 7a4a85c8038..a58f422a816 100644
--- a/src/include/utils/plancache.h
+++ b/src/include/utils/plancache.h
@@ -241,6 +241,10 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource,
 								 ParamListInfo boundParams,
 								 ResourceOwner owner,
 								 QueryEnvironment *queryEnv);
+extern CachedPlan *GetCachedPlanLocked(CachedPlanSource *plansource,
+									   ParamListInfo boundParams,
+									   ResourceOwner owner,
+									   QueryEnvironment *queryEnv);
 extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner);
 
 extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource,
-- 
2.47.3



view thread (6+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected]
  Subject: Re: GetCachedPlan() refactor: move execution lock acquisition out
  In-Reply-To: <CA+HiwqE1ntHy2h9zJ9v3MwAkoGAveSERcHWkDTTZnP0kxWqbKQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox