public inbox for [email protected]help / color / mirror / Atom feed
GetCachedPlan() refactor: move execution lock acquisition out 6+ messages / 3 participants [nested] [flat]
* GetCachedPlan() refactor: move execution lock acquisition out @ 2026-04-14 08:19 Amit Langote <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Amit Langote @ 2026-04-14 08:19 UTC (permalink / raw) To: pgsql-hackers Over in the "generic plans and initial pruning" thread [1], I came to think that the cleanest way to address the architectural issue behind that thread is to make GetCachedPlan() do less execution-related work. Specifically, it should stop acquiring execution locks itself, and leave that decision to callers. GetCachedPlan() has long combined plan acquisition with execution setup, and that coupling makes it harder to improve either side cleanly. Tom pointed in this direction back in 2023 [2]: "...the whole problem arises because the system is wrongly factored. We should get rid of AcquireExecutorLocks altogether, allowing the plancache to hand back a generic plan that it's not certain of the validity of, and instead integrate the responsibility for acquiring locks into executor startup." This patch takes the first half of that suggestion by moving execution locking out of the plancache and into its callers. The attached patch does that, with no intended behavioral change: * Remove AcquireExecutorLocks() from CheckCachedPlan(), so GetCachedPlan() returns a plan without holding execution locks. * Add a new internal helper in plancache.c, LockCachedPlan(), which acquires execution locks, rechecks validity, and returns false if the plan was invalidated, so the caller can retry. * Add GetCachedPlanLocked() as a convenience wrapper that fetches a plan, calls LockCachedPlan(), and retries on invalidation. This becomes the normal entry point for callers that want a plan ready for execution. * Convert existing GetCachedPlan() callers to use GetCachedPlanLocked(). The second half of Tom's suggestion, moving that responsibility into executor startup, is trickier. ExecutorStart() is a stable API with extension hooks, so changing its contract has a fairly high bar. I tried that route once already [3]. A working variant that adds an ExecutorPrep() entry point, with a wrapper implementing pruning-aware locking on top, was discussed in the original thread [1]. But rather than propose that whole stack at once, I think it is better to do this in phases: first decouple plan acquisition from execution locking, then revisit the ExecutorPrep() piece separately if this goes in. The eventual goal is still pruning-aware locking. I'm posting this piece separately because the original thread has become fairly long, and this part seems easier to review on its own. [1] https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com [2] https://postgr.es/m/4191508.1674157166%40sss.pgh.pa.us [3] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1722d5eb05d8e5d2e064cd1798abcae4f296c... -- Thanks, Amit Langote Attachments: [application/octet-stream] v1-0001-Move-execution-lock-acquisition-out-of-GetCachedP.patch (10.7K, 2-v1-0001-Move-execution-lock-acquisition-out-of-GetCachedP.patch) download | inline diff: From 5707788ed4ccec870a128cf2c960b7cd6cd72249 Mon Sep 17 00:00:00 2001 From: Amit Langote <[email protected]> Date: Tue, 14 Apr 2026 16:37:22 +0900 Subject: [PATCH v1] Move execution lock acquisition out of GetCachedPlan() GetCachedPlan() previously acquired executor locks as part of checking whether a cached plan could be used. Split those concerns so that GetCachedPlan() no longer acquires execution locks. Add an internal helper, LockCachedPlan(), which acquires execution locks on all plan relations and returns false if the plan is invalidated while locking, allowing the caller to retry. Add GetCachedPlanLocked() as a convenience wrapper that fetches a plan, locks it using the new helper, and retries on invalidation. Convert all current callers to use this wrapper. No intended behavioral change for current callers. They still acquire execution locks on all plan relations as before. This refactoring makes it possible to add more selective locking in follow-on work. Discussion: https://postgr.es/m/ --- src/backend/commands/prepare.c | 7 +-- src/backend/executor/functions.c | 6 +-- src/backend/executor/spi.c | 14 ++--- src/backend/tcop/postgres.c | 2 +- src/backend/utils/cache/plancache.c | 80 +++++++++++++++++++++++++---- src/include/utils/plancache.h | 4 ++ 6 files changed, 88 insertions(+), 25 deletions(-) diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c index 876aad2100a..467911ea980 100644 --- a/src/backend/commands/prepare.c +++ b/src/backend/commands/prepare.c @@ -195,7 +195,7 @@ ExecuteQuery(ParseState *pstate, entry->plansource->query_string); /* Replan if needed, and increment plan refcount for portal */ - cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL); + cplan = GetCachedPlanLocked(entry->plansource, paramLI, NULL, NULL); plan_list = cplan->stmt_list; /* @@ -632,8 +632,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es, } /* Replan if needed, and acquire a transient refcount */ - cplan = GetCachedPlan(entry->plansource, paramLI, - CurrentResourceOwner, pstate->p_queryEnv); + cplan = GetCachedPlanLocked(entry->plansource, paramLI, + CurrentResourceOwner, + pstate->p_queryEnv); INSTR_TIME_SET_CURRENT(planduration); INSTR_TIME_SUBTRACT(planduration, planstart); diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c index 88109348817..4ff5f130457 100644 --- a/src/backend/executor/functions.c +++ b/src/backend/executor/functions.c @@ -696,10 +696,8 @@ init_execution_state(SQLFunctionCachePtr fcache) * CurrentResourceOwner will be the same when ShutdownSQLFunction runs.) */ fcache->cowner = CurrentResourceOwner; - fcache->cplan = GetCachedPlan(plansource, - fcache->paramLI, - fcache->cowner, - NULL); + fcache->cplan = GetCachedPlanLocked(plansource, fcache->paramLI, + fcache->cowner, NULL); /* * If necessary, make esarray[] bigger to hold the needed state. diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index 52f3b11301c..58238dd3f34 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -1660,7 +1660,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan, */ /* Replan if needed, and increment plan refcount for portal */ - cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv); + cplan = GetCachedPlanLocked(plansource, paramLI, NULL, + _SPI_current->queryEnv); stmt_list = cplan->stmt_list; if (!plan->saved) @@ -2101,9 +2102,9 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan) error_context_stack = &spierrcontext; /* Get the generic plan for the query */ - cplan = GetCachedPlan(plansource, NULL, - plan->saved ? CurrentResourceOwner : NULL, - _SPI_current->queryEnv); + cplan = GetCachedPlanLocked(plansource, NULL, + plan->saved ? CurrentResourceOwner : NULL, + _SPI_current->queryEnv); Assert(cplan == plansource->gplan); /* Pop the error context stack */ @@ -2574,9 +2575,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options, * Replan if needed, and increment plan refcount. If it's a saved * plan, the refcount must be backed by the plan_owner. */ - cplan = GetCachedPlan(plansource, options->params, - plan_owner, _SPI_current->queryEnv); - + cplan = GetCachedPlanLocked(plansource, options->params, + plan_owner, _SPI_current->queryEnv); stmt_list = cplan->stmt_list; /* diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index aeaf1c6db8f..3782e5197d4 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -2028,7 +2028,7 @@ exec_bind_message(StringInfo input_message) * will be generated in MessageContext. The plan refcount will be * assigned to the Portal, so it will be released at portal destruction. */ - cplan = GetCachedPlan(psrc, params, NULL, NULL); + cplan = GetCachedPlanLocked(psrc, params, NULL, NULL); /* * Now we can define the portal. diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c index 698e7c1aa22..10da50f0d9c 100644 --- a/src/backend/utils/cache/plancache.c +++ b/src/backend/utils/cache/plancache.c @@ -101,6 +101,7 @@ static bool choose_custom_plan(CachedPlanSource *plansource, static double cached_plan_cost(CachedPlan *plan, bool include_planner); static Query *QueryListGetPrimaryStmt(List *stmts); static void AcquireExecutorLocks(List *stmt_list, bool acquire); +static bool LockCachedPlan(CachedPlan *cplan); static void AcquirePlannerLocks(List *stmt_list, bool acquire); static void ScanQueryForLocks(Query *parsetree, bool acquire); static bool ScanQueryWalker(Node *node, bool *acquire); @@ -945,8 +946,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource, * Caller must have already called RevalidateCachedQuery to verify that the * querytree is up to date. * - * On a "true" return, we have acquired the locks needed to run the plan. - * (We must do this for the "true" result to be race-condition-free.) + * On a "true" return, the generic plan may be reused as a valid cached + * plan. Any execution-time setup, including lock acquisition, is the + * caller's responsibility. */ static bool CheckCachedPlan(CachedPlanSource *plansource) @@ -983,8 +985,6 @@ CheckCachedPlan(CachedPlanSource *plansource) */ Assert(plan->refcount > 0); - AcquireExecutorLocks(plan->stmt_list, true); - /* * If plan was transient, check to see if TransactionXmin has * advanced, and if so invalidate it. @@ -1000,12 +1000,9 @@ CheckCachedPlan(CachedPlanSource *plansource) */ if (plan->is_valid) { - /* Successfully revalidated and locked the query. */ + /* Successfully revalidated the query. */ return true; } - - /* Oops, the race case happened. Release useless locks. */ - AcquireExecutorLocks(plan->stmt_list, false); } /* @@ -1282,8 +1279,10 @@ cached_plan_cost(CachedPlan *plan, bool include_planner) * plan or a custom plan for the given parameters: the caller does not know * which it will get. * - * On return, the plan is valid and we have sufficient locks to begin - * execution. + * On return, the plan is valid but execution locks are not held. The + * caller is responsible for acquiring them before execution. Most + * callers should use GetCachedPlanLocked() instead, which handles + * locking and retry-on-invalidation. * * On return, the refcount of the plan has been incremented; a later * ReleaseCachedPlan() call is expected. If "owner" is not NULL then @@ -1413,6 +1412,42 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams, return plan; } +/* + * GetCachedPlanLocked: fetch a cached plan ready for execution. + * + * This is a convenience wrapper around GetCachedPlan() that also acquires + * execution locks. If the plan is invalidated while locks are being + * acquired, it releases the plan and retries until a valid, locked plan is + * obtained. + * + * LockCachedPlan() is called unconditionally, even for freshly built plans + * where the planner already holds the needed locks. The redundant + * LockRelationOid() calls are cheap: a hash lookup and local refcount bump + * per relation. + * + * This is the normal entry point for callers that need a plan ready to + * execute. Callers that need custom locking behavior should call + * GetCachedPlan() directly and implement their own locking. + */ +CachedPlan * +GetCachedPlanLocked(CachedPlanSource *plansource, ParamListInfo boundParams, + ResourceOwner owner, QueryEnvironment *queryEnv) +{ + CachedPlan *cplan = NULL; + + for (;;) + { + cplan = GetCachedPlan(plansource, boundParams, owner, queryEnv); + if (LockCachedPlan(cplan)) + break; + + ReleaseCachedPlan(cplan, owner); + } + + Assert(cplan); + return cplan; +} + /* * ReleaseCachedPlan: release active use of a cached plan. * @@ -1906,6 +1941,9 @@ QueryListGetPrimaryStmt(List *stmts) /* * AcquireExecutorLocks: acquire locks needed for execution of a cached plan; * or release them if acquire is false. + * + * Acquire or release execution locks on all relations referenced by the + * PlannedStmts in stmt_list. */ static void AcquireExecutorLocks(List *stmt_list, bool acquire) @@ -1955,6 +1993,28 @@ AcquireExecutorLocks(List *stmt_list, bool acquire) } } +/* + * LockCachedPlan + * Acquire execution locks on all relations referenced by a cached + * plan. + * + * Returns true if the plan is still valid after locking. Returns false if + * the plan was invalidated while locks were being acquired, in which case + * any locks acquired by this function have been released and the caller + * should discard this plan and retry with a fresh one from GetCachedPlan(). + */ +static bool +LockCachedPlan(CachedPlan *cplan) +{ + AcquireExecutorLocks(cplan->stmt_list, true); + if (!cplan->is_valid) + { + AcquireExecutorLocks(cplan->stmt_list, false); + return false; + } + return true; +} + /* * AcquirePlannerLocks: acquire locks needed for planning of a querytree list; * or release them if acquire is false. diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h index 7a4a85c8038..a58f422a816 100644 --- a/src/include/utils/plancache.h +++ b/src/include/utils/plancache.h @@ -241,6 +241,10 @@ extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams, ResourceOwner owner, QueryEnvironment *queryEnv); +extern CachedPlan *GetCachedPlanLocked(CachedPlanSource *plansource, + ParamListInfo boundParams, + ResourceOwner owner, + QueryEnvironment *queryEnv); extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner); extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource, -- 2.47.3 ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: GetCachedPlan() refactor: move execution lock acquisition out @ 2026-04-14 16:08 Kirill Reshke <[email protected]> parent: Amit Langote <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Kirill Reshke @ 2026-04-14 16:08 UTC (permalink / raw) To: Amit Langote <[email protected]>; +Cc: pgsql-hackers On Tue, 14 Apr 2026 at 13:20, Amit Langote <[email protected]> wrote: > > Over in the "generic plans and initial pruning" thread [1], I came to > think that the cleanest way to address the architectural issue behind > that thread is to make GetCachedPlan() do less execution-related work. > Specifically, it should stop acquiring execution locks itself, and > leave that decision to callers. > > GetCachedPlan() has long combined plan acquisition with execution > setup, and that coupling makes it harder to improve either side > cleanly. Tom pointed in this direction back in 2023 [2]: > > "...the whole problem arises because the system is wrongly factored. > We should get rid of AcquireExecutorLocks altogether, allowing the > plancache to hand back a generic plan that it's not certain of the > validity of, and instead integrate the responsibility for acquiring > locks into executor startup." > > This patch takes the first half of that suggestion by moving execution > locking out of the plancache and into its callers. > > The attached patch does that, with no intended behavioral change: > > * Remove AcquireExecutorLocks() from CheckCachedPlan(), so > GetCachedPlan() returns a plan without holding execution locks. > > * Add a new internal helper in plancache.c, LockCachedPlan(), which > acquires execution locks, rechecks validity, and returns false if the > plan was invalidated, so the caller can retry. > > * Add GetCachedPlanLocked() as a convenience wrapper that fetches a > plan, calls LockCachedPlan(), and retries on invalidation. This > becomes the normal entry point for callers that want a plan ready for > execution. > > * Convert existing GetCachedPlan() callers to use GetCachedPlanLocked(). > > The second half of Tom's suggestion, moving that responsibility into > executor startup, is trickier. ExecutorStart() is a stable API with > extension hooks, so changing its contract has a fairly high bar. I > tried that route once already [3]. > > A working variant that adds an ExecutorPrep() entry point, with a > wrapper implementing pruning-aware locking on top, was discussed in > the original thread [1]. But rather than propose that whole stack at > once, I think it is better to do this in phases: first decouple plan > acquisition from execution locking, then revisit the ExecutorPrep() > piece separately if this goes in. > > The eventual goal is still pruning-aware locking. I'm posting this > piece separately because the original thread has become fairly long, > and this part seems easier to review on its own. > > [1] https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com > [2] https://postgr.es/m/4191508.1674157166%40sss.pgh.pa.us > [3] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=1722d5eb05d8e5d2e064cd1798abcae4f296c... > > -- > Thanks, Amit Langote Hi! I have slightly checked v1. Can't tell if refactoring does make cached plan maintenance more straightforward because of lack of experience in this area, but here's my 2c > + > + for (;;) > + { > + cplan = GetCachedPlan(plansource, boundParams, owner, queryEnv); > + if (LockCachedPlan(cplan)) > + break; > + > + ReleaseCachedPlan(cplan, owner); > + } Should we use CFI here? > +static bool > +LockCachedPlan(CachedPlan *cplan) > +{ > + AcquireExecutorLocks(cplan->stmt_list, true); > + if (!cplan->is_valid) > + { > + AcquireExecutorLocks(cplan->stmt_list, false); > + return false; > + } > + return true; > +} simply `return cplan->is_valid ` would be more Postgres-y here, isnt it? -- Best regards, Kirill Reshke ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: GetCachedPlan() refactor: move execution lock acquisition out @ 2026-04-15 02:14 Amit Langote <[email protected]> parent: Kirill Reshke <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Amit Langote @ 2026-04-15 02:14 UTC (permalink / raw) To: Kirill Reshke <[email protected]>; +Cc: pgsql-hackers Hi, On Wed, Apr 15, 2026 at 1:08 AM Kirill Reshke <[email protected]> wrote: > On Tue, 14 Apr 2026 at 13:20, Amit Langote <[email protected]> wrote: > > Over in the "generic plans and initial pruning" thread [1], I came to > > think that the cleanest way to address the architectural issue behind > > that thread is to make GetCachedPlan() do less execution-related work. > > Specifically, it should stop acquiring execution locks itself, and > > leave that decision to callers. > > > > GetCachedPlan() has long combined plan acquisition with execution > > setup, and that coupling makes it harder to improve either side > > cleanly. Tom pointed in this direction back in 2023 [2]: > > > > "...the whole problem arises because the system is wrongly factored. > > We should get rid of AcquireExecutorLocks altogether, allowing the > > plancache to hand back a generic plan that it's not certain of the > > validity of, and instead integrate the responsibility for acquiring > > locks into executor startup." > > > > This patch takes the first half of that suggestion by moving execution > > locking out of the plancache and into its callers. > > > > The attached patch does that, with no intended behavioral change: > > > > * Remove AcquireExecutorLocks() from CheckCachedPlan(), so > > GetCachedPlan() returns a plan without holding execution locks. > > > > * Add a new internal helper in plancache.c, LockCachedPlan(), which > > acquires execution locks, rechecks validity, and returns false if the > > plan was invalidated, so the caller can retry. > > > > * Add GetCachedPlanLocked() as a convenience wrapper that fetches a > > plan, calls LockCachedPlan(), and retries on invalidation. This > > becomes the normal entry point for callers that want a plan ready for > > execution. > > > > * Convert existing GetCachedPlan() callers to use GetCachedPlanLocked(). > > > > The second half of Tom's suggestion, moving that responsibility into > > executor startup, is trickier. ExecutorStart() is a stable API with > > extension hooks, so changing its contract has a fairly high bar. I > > tried that route once already [3]. > > > > A working variant that adds an ExecutorPrep() entry point, with a > > wrapper implementing pruning-aware locking on top, was discussed in > > the original thread [1]. But rather than propose that whole stack at > > once, I think it is better to do this in phases: first decouple plan > > acquisition from execution locking, then revisit the ExecutorPrep() > > piece separately if this goes in. > > > > The eventual goal is still pruning-aware locking. I'm posting this > > piece separately because the original thread has become fairly long, > > and this part seems easier to review on its own. > > Hi! I have slightly checked v1. Can't tell if refactoring does make > cached plan maintenance more straightforward because of lack of > experience in this area, but here's my 2c Thanks for taking a look. Could you say a bit more about what aspect of cached plan maintenance you have in mind? My main motivation here is to separate plan acquisition from execution-time locking, so that locking policy no longer has to live inside GetCachedPlan(). > > + > > + for (;;) > > + { > > + cplan = GetCachedPlan(plansource, boundParams, owner, queryEnv); > > + if (LockCachedPlan(cplan)) > > + break; > > + > > + ReleaseCachedPlan(cplan, owner); > > + } > > Should we use CFI here? Good point. On a closer look, I think the retry loop can go away entirely. If LockCachedPlan() fails, releasing the plan and calling GetCachedPlan() again is enough, because that will rebuild the plan and the needed locks will already be held by the planner. I'll simplify that in the next version. > > +static bool > > +LockCachedPlan(CachedPlan *cplan) > > +{ > > + AcquireExecutorLocks(cplan->stmt_list, true); > > + if (!cplan->is_valid) > > + { > > + AcquireExecutorLocks(cplan->stmt_list, false); > > + return false; > > + } > > + return true; > > +} > > simply `return cplan->is_valid ` would be more Postgres-y here, isnt it? Agreed, will fix too. I'll hold off on posting a v2 for now. -- Thanks, Amit Langote ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: GetCachedPlan() refactor: move execution lock acquisition out @ 2026-04-15 13:46 Amit Langote <[email protected]> parent: Amit Langote <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Amit Langote @ 2026-04-15 13:46 UTC (permalink / raw) To: Kirill Reshke <[email protected]>; +Cc: pgsql-hackers On Wed, Apr 15, 2026 at 11:14 AM Amit Langote <[email protected]> wrote: > On Wed, Apr 15, 2026 at 1:08 AM Kirill Reshke <[email protected]> wrote: > > > +static bool > > > +LockCachedPlan(CachedPlan *cplan) > > > +{ > > > + AcquireExecutorLocks(cplan->stmt_list, true); > > > + if (!cplan->is_valid) > > > + { > > > + AcquireExecutorLocks(cplan->stmt_list, false); > > > + return false; > > > + } > > > + return true; > > > +} > > > > simply `return cplan->is_valid ` would be more Postgres-y here, isnt it? > > Agreed, will fix too. Thinking more about this bit, checking only cplan->is_valid after locking is not really equivalent to what CheckCachedPlan() was doing before. CheckCachedPlan() can also reset plan->is_valid based on other state (role, xmin), whereas in v1 the new check relied only on lock inval callbacks to have set is_valid to false. That means the new check is not enough. It may happen to be okay for the current callers in this patch, but it is not something I think is safe to rely on once there are future callers that do arbitrary work, such as ExecutorPrep(), even if carefully coded, between the original GetCachedPlan() / CheckCachedPlan() step and the later recheck after locking. Separately, I also realized that v1 was introducing redundant lock acquisition for freshly built plans, which is not really a no behavior change refactoring either. So in v2 I ended up reworking two parts more substantially: * The patch now factors the reused-generic-plan validity logic into RecheckCachedPlan(). * It also adds an is_reused output argument to GetCachedPlan(), so callers can distinguish the reused-plan case from a freshly built plan and avoid the extra lock step in the latter case. -- Thanks, Amit Langote Attachments: [application/octet-stream] v2-0001-Move-execution-lock-acquisition-out-of-GetCachedP.patch (15.4K, 2-v2-0001-Move-execution-lock-acquisition-out-of-GetCachedP.patch) download | inline diff: From b1cf7f4648cf3ce156ec355f159ac26b8e8ce73e Mon Sep 17 00:00:00 2001 From: Amit Langote <[email protected]> Date: Wed, 15 Apr 2026 14:27:55 +0900 Subject: [PATCH v2] Move execution lock acquisition out of GetCachedPlan() GetCachedPlan() previously acquired executor locks as part of checking whether a cached plan could be used. Split those concerns so that GetCachedPlan() no longer acquires execution locks. Add an internal helper, LockCachedPlan(), which acquires execution locks on all plan relations and returns false if the plan is invalidated while locking, allowing the caller to retry. Add GetCachedPlanLocked() as a convenience wrapper that fetches a plan, locks it using the new helper, and retries on invalidation. Convert all current callers to use this wrapper. No intended behavioral change for current callers. They still acquire execution locks on all plan relations as before. This refactoring makes it possible to add more selective locking in follow-on work. Discussion: https://postgr.es/m/ --- src/backend/commands/prepare.c | 7 +- src/backend/executor/functions.c | 6 +- src/backend/executor/spi.c | 14 +-- src/backend/tcop/postgres.c | 2 +- src/backend/utils/cache/plancache.c | 134 +++++++++++++++++++++++----- src/include/utils/plancache.h | 8 +- 6 files changed, 135 insertions(+), 36 deletions(-) diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c index 876aad2100a..467911ea980 100644 --- a/src/backend/commands/prepare.c +++ b/src/backend/commands/prepare.c @@ -195,7 +195,7 @@ ExecuteQuery(ParseState *pstate, entry->plansource->query_string); /* Replan if needed, and increment plan refcount for portal */ - cplan = GetCachedPlan(entry->plansource, paramLI, NULL, NULL); + cplan = GetCachedPlanLocked(entry->plansource, paramLI, NULL, NULL); plan_list = cplan->stmt_list; /* @@ -632,8 +632,9 @@ ExplainExecuteQuery(ExecuteStmt *execstmt, IntoClause *into, ExplainState *es, } /* Replan if needed, and acquire a transient refcount */ - cplan = GetCachedPlan(entry->plansource, paramLI, - CurrentResourceOwner, pstate->p_queryEnv); + cplan = GetCachedPlanLocked(entry->plansource, paramLI, + CurrentResourceOwner, + pstate->p_queryEnv); INSTR_TIME_SET_CURRENT(planduration); INSTR_TIME_SUBTRACT(planduration, planstart); diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c index 88109348817..4ff5f130457 100644 --- a/src/backend/executor/functions.c +++ b/src/backend/executor/functions.c @@ -696,10 +696,8 @@ init_execution_state(SQLFunctionCachePtr fcache) * CurrentResourceOwner will be the same when ShutdownSQLFunction runs.) */ fcache->cowner = CurrentResourceOwner; - fcache->cplan = GetCachedPlan(plansource, - fcache->paramLI, - fcache->cowner, - NULL); + fcache->cplan = GetCachedPlanLocked(plansource, fcache->paramLI, + fcache->cowner, NULL); /* * If necessary, make esarray[] bigger to hold the needed state. diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index 52f3b11301c..58238dd3f34 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -1660,7 +1660,8 @@ SPI_cursor_open_internal(const char *name, SPIPlanPtr plan, */ /* Replan if needed, and increment plan refcount for portal */ - cplan = GetCachedPlan(plansource, paramLI, NULL, _SPI_current->queryEnv); + cplan = GetCachedPlanLocked(plansource, paramLI, NULL, + _SPI_current->queryEnv); stmt_list = cplan->stmt_list; if (!plan->saved) @@ -2101,9 +2102,9 @@ SPI_plan_get_cached_plan(SPIPlanPtr plan) error_context_stack = &spierrcontext; /* Get the generic plan for the query */ - cplan = GetCachedPlan(plansource, NULL, - plan->saved ? CurrentResourceOwner : NULL, - _SPI_current->queryEnv); + cplan = GetCachedPlanLocked(plansource, NULL, + plan->saved ? CurrentResourceOwner : NULL, + _SPI_current->queryEnv); Assert(cplan == plansource->gplan); /* Pop the error context stack */ @@ -2574,9 +2575,8 @@ _SPI_execute_plan(SPIPlanPtr plan, const SPIExecuteOptions *options, * Replan if needed, and increment plan refcount. If it's a saved * plan, the refcount must be backed by the plan_owner. */ - cplan = GetCachedPlan(plansource, options->params, - plan_owner, _SPI_current->queryEnv); - + cplan = GetCachedPlanLocked(plansource, options->params, + plan_owner, _SPI_current->queryEnv); stmt_list = cplan->stmt_list; /* diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index aeaf1c6db8f..3782e5197d4 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -2028,7 +2028,7 @@ exec_bind_message(StringInfo input_message) * will be generated in MessageContext. The plan refcount will be * assigned to the Portal, so it will be released at portal destruction. */ - cplan = GetCachedPlan(psrc, params, NULL, NULL); + cplan = GetCachedPlanLocked(psrc, params, NULL, NULL); /* * Now we can define the portal. diff --git a/src/backend/utils/cache/plancache.c b/src/backend/utils/cache/plancache.c index 698e7c1aa22..80987221ef4 100644 --- a/src/backend/utils/cache/plancache.c +++ b/src/backend/utils/cache/plancache.c @@ -101,6 +101,7 @@ static bool choose_custom_plan(CachedPlanSource *plansource, static double cached_plan_cost(CachedPlan *plan, bool include_planner); static Query *QueryListGetPrimaryStmt(List *stmts); static void AcquireExecutorLocks(List *stmt_list, bool acquire); +static bool LockCachedPlan(CachedPlan *cplan, CachedPlanSource *plansource); static void AcquirePlannerLocks(List *stmt_list, bool acquire); static void ScanQueryForLocks(Query *parsetree, bool acquire); static bool ScanQueryWalker(Node *node, bool *acquire); @@ -945,8 +946,9 @@ RevalidateCachedQuery(CachedPlanSource *plansource, * Caller must have already called RevalidateCachedQuery to verify that the * querytree is up to date. * - * On a "true" return, we have acquired the locks needed to run the plan. - * (We must do this for the "true" result to be race-condition-free.) + * On a "true" return, the generic plan remains usable as a cached plan. + * Any execution-time setup, including lock acquisition, is the caller's + * responsibility. */ static bool CheckCachedPlan(CachedPlanSource *plansource) @@ -960,6 +962,25 @@ CheckCachedPlan(CachedPlanSource *plansource) if (!plan) return false; + /* Recheck whether the existing generic plan is still reusable. */ + return RecheckCachedPlan(plan, plansource); +} + +/* + * RecheckCachedPlan: recheck whether a reused generic plan is still reusable. + * + * This is for callers that obtained a reused generic plan from + * GetCachedPlan(..., &is_reused), observed *is_reused == true, and have + * since completed whatever additional setup and execution locking they + * require before execution. + * + * It does not acquire or release execution locks. On failure, it releases + * the generic plan from the plansource. + */ +bool +RecheckCachedPlan(CachedPlan *plan, CachedPlanSource *plansource) +{ + Assert(plan == plansource->gplan); Assert(plan->magic == CACHEDPLAN_MAGIC); /* Generic plans are never one-shot */ Assert(!plan->is_oneshot); @@ -972,8 +993,8 @@ CheckCachedPlan(CachedPlanSource *plansource) plan->is_valid = false; /* - * If it appears valid, acquire locks and recheck; this is much the same - * logic as in RevalidateCachedQuery, but for a plan. + * If it still appears valid, perform the remaining validity checks. This + * is much the same logic as in RevalidateCachedQuery(), but for a plan. */ if (plan->is_valid) { @@ -983,8 +1004,6 @@ CheckCachedPlan(CachedPlanSource *plansource) */ Assert(plan->refcount > 0); - AcquireExecutorLocks(plan->stmt_list, true); - /* * If plan was transient, check to see if TransactionXmin has * advanced, and if so invalidate it. @@ -1000,12 +1019,9 @@ CheckCachedPlan(CachedPlanSource *plansource) */ if (plan->is_valid) { - /* Successfully revalidated and locked the query. */ + /* Successfully revalidated the query. */ return true; } - - /* Oops, the race case happened. Release useless locks. */ - AcquireExecutorLocks(plan->stmt_list, false); } /* @@ -1282,8 +1298,13 @@ cached_plan_cost(CachedPlan *plan, bool include_planner) * plan or a custom plan for the given parameters: the caller does not know * which it will get. * - * On return, the plan is valid and we have sufficient locks to begin - * execution. + * On return, the plan is the current cached plan, but execution locks are + * not held. The caller is responsible for acquiring them before execution. + * Callers that need a plan ready for execution should use + * GetCachedPlanLocked(), which handles locking and retry on invalidation. + * + * If is_reused is not NULL, *is_reused is set true when the returned plan is + * a reused generic plan, and false otherwise. * * On return, the refcount of the plan has been incremented; a later * ReleaseCachedPlan() call is expected. If "owner" is not NULL then @@ -1295,7 +1316,8 @@ cached_plan_cost(CachedPlan *plan, bool include_planner) */ CachedPlan * GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams, - ResourceOwner owner, QueryEnvironment *queryEnv) + ResourceOwner owner, QueryEnvironment *queryEnv, + bool *is_reused) { CachedPlan *plan = NULL; List *qlist; @@ -1309,6 +1331,9 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams, if (owner && !plansource->is_saved) elog(ERROR, "cannot apply ResourceOwner to non-saved cached plan"); + if (is_reused) + *is_reused = false; + /* Make sure the querytree list is valid and we have parse-time locks */ qlist = RevalidateCachedQuery(plansource, queryEnv); @@ -1322,6 +1347,8 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams, /* We want a generic plan, and we already have a valid one */ plan = plansource->gplan; Assert(plan->magic == CACHEDPLAN_MAGIC); + if (is_reused) + *is_reused = true; } else { @@ -1413,6 +1440,48 @@ GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams, return plan; } +/* + * GetCachedPlanLocked: fetch a cached plan ready for execution. + * + * This is a convenience wrapper around GetCachedPlan() that also acquires + * execution locks. + * + * If GetCachedPlan() returns a reused generic plan, this acquires execution + * locks and rechecks plan validity. If the plan is invalidated while locks + * are being acquired, it releases that plan and calls GetCachedPlan() again, + * which will build a fresh plan with the needed locks already held by the + * planner. + * + * Freshly built plans are returned directly, avoiding redundant lock + * acquisition in that case. + * + * This is the normal entry point for callers that need a plan ready to + * execute. Callers that need custom locking behavior should call + * GetCachedPlan() directly and implement their own locking. + */ +CachedPlan * +GetCachedPlanLocked(CachedPlanSource *plansource, ParamListInfo boundParams, + ResourceOwner owner, QueryEnvironment *queryEnv) +{ + CachedPlan *cplan; + bool is_reused; + + cplan = GetCachedPlan(plansource, boundParams, owner, queryEnv, &is_reused); + if (is_reused && !LockCachedPlan(cplan, plansource)) + { + ReleaseCachedPlan(cplan, owner); + cplan = GetCachedPlan(plansource, boundParams, owner, queryEnv, &is_reused); + + /* + * A second GetCachedPlan() call rebuilds the plan rather than reusing + * it. + */ + Assert(!is_reused); + } + + return cplan; +} + /* * ReleaseCachedPlan: release active use of a cached plan. * @@ -1451,10 +1520,10 @@ ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner) * * This function, together with CachedPlanIsSimplyValid, provides a fast path * for revalidating "simple" generic plans. The core requirement to be simple - * is that the plan must not require taking any locks, which translates to - * not touching any tables; this happens to match up well with an important - * use-case in PL/pgSQL. This function tests whether that's true, along - * with checking some other corner cases that we'd rather not bother with + * is that the plan must not require taking any execution locks, which + * translates to not touching any tables; this happens to match up well with an + * important use-case in PL/pgSQL. This function tests whether that's true, + * along with checking some other corner cases that we'd rather not bother with * handling in the fast path. (Note that it's still possible for such a plan * to be invalidated, for example due to a change in a function that was * inlined into the plan.) @@ -1524,7 +1593,7 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource, } /* - * Reject if AcquireExecutorLocks would have anything to do. This is + * Reject if the plan would require taking any execution locks. This is * probably unnecessary given the previous check, but let's be safe. */ foreach(lc, plan->stmt_list) @@ -1549,8 +1618,8 @@ CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource, } /* - * Okay, it's simple. Note that what we've primarily established here is - * that no locks need be taken before checking the plan's is_valid flag. + * Okay, it's simple. What we've primarily established here is that no + * execution locks need be taken before checking the plan's is_valid flag. */ /* Bump refcount if requested. */ @@ -1906,6 +1975,9 @@ QueryListGetPrimaryStmt(List *stmts) /* * AcquireExecutorLocks: acquire locks needed for execution of a cached plan; * or release them if acquire is false. + * + * Acquire or release execution locks on all relations referenced by the + * PlannedStmts in stmt_list. */ static void AcquireExecutorLocks(List *stmt_list, bool acquire) @@ -1955,6 +2027,28 @@ AcquireExecutorLocks(List *stmt_list, bool acquire) } } +/* + * LockCachedPlan + * Acquire execution locks on all relations referenced by a reused + * cached plan, and recheck validity after locking. + * + * Returns false if the plan is invalidated while locks are being acquired, + * in which case any locks acquired by this function have been released. + */ +static bool +LockCachedPlan(CachedPlan *cplan, CachedPlanSource *plansource) +{ + AcquireExecutorLocks(cplan->stmt_list, true); + + if (!RecheckCachedPlan(cplan, plansource)) + { + AcquireExecutorLocks(cplan->stmt_list, false); + return false; + } + + return true; +} + /* * AcquirePlannerLocks: acquire locks needed for planning of a querytree list; * or release them if acquire is false. diff --git a/src/include/utils/plancache.h b/src/include/utils/plancache.h index 7a4a85c8038..87fd1268eac 100644 --- a/src/include/utils/plancache.h +++ b/src/include/utils/plancache.h @@ -240,7 +240,13 @@ extern List *CachedPlanGetTargetList(CachedPlanSource *plansource, extern CachedPlan *GetCachedPlan(CachedPlanSource *plansource, ParamListInfo boundParams, ResourceOwner owner, - QueryEnvironment *queryEnv); + QueryEnvironment *queryEnv, + bool *is_reused); +extern bool RecheckCachedPlan(CachedPlan *cplan, CachedPlanSource *plansource); +extern CachedPlan *GetCachedPlanLocked(CachedPlanSource *plansource, + ParamListInfo boundParams, + ResourceOwner owner, + QueryEnvironment *queryEnv); extern void ReleaseCachedPlan(CachedPlan *plan, ResourceOwner owner); extern bool CachedPlanAllowsSimpleValidityCheck(CachedPlanSource *plansource, -- 2.47.3 ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: GetCachedPlan() refactor: move execution lock acquisition out @ 2026-04-16 02:34 Chao Li <[email protected]> parent: Amit Langote <[email protected]> 0 siblings, 1 reply; 6+ messages in thread From: Chao Li @ 2026-04-16 02:34 UTC (permalink / raw) To: Amit Langote <[email protected]>; +Cc: Kirill Reshke <[email protected]>; pgsql-hackers > On Apr 15, 2026, at 21:46, Amit Langote <[email protected]> wrote: > > On Wed, Apr 15, 2026 at 11:14 AM Amit Langote <[email protected]> wrote: >> On Wed, Apr 15, 2026 at 1:08 AM Kirill Reshke <[email protected]> wrote: >>>> +static bool >>>> +LockCachedPlan(CachedPlan *cplan) >>>> +{ >>>> + AcquireExecutorLocks(cplan->stmt_list, true); >>>> + if (!cplan->is_valid) >>>> + { >>>> + AcquireExecutorLocks(cplan->stmt_list, false); >>>> + return false; >>>> + } >>>> + return true; >>>> +} >>> >>> simply `return cplan->is_valid ` would be more Postgres-y here, isnt it? >> >> Agreed, will fix too. > > Thinking more about this bit, checking only cplan->is_valid after > locking is not really equivalent to what CheckCachedPlan() was doing > before. > > CheckCachedPlan() can also reset plan->is_valid based on other state > (role, xmin), whereas in v1 the new check relied only on lock inval > callbacks to have set is_valid to false. That means the new check is > not enough. It may happen to be okay for the current callers in this > patch, but it is not something I think is safe to rely on once there > are future callers that do arbitrary work, such as ExecutorPrep(), > even if carefully coded, between the original GetCachedPlan() / > CheckCachedPlan() step and the later recheck after locking. > > Separately, I also realized that v1 was introducing redundant lock > acquisition for freshly built plans, which is not really a no behavior > change refactoring either. > > So in v2 I ended up reworking two parts more substantially: > > * The patch now factors the reused-generic-plan validity logic into > RecheckCachedPlan(). > > * It also adds an is_reused output argument to GetCachedPlan(), so > callers can distinguish the reused-plan case from a freshly built plan > and avoid the extra lock step in the latter case. > > -- > Thanks, Amit Langote > <v2-0001-Move-execution-lock-acquisition-out-of-GetCachedP.patch> V2 overall looks good, it preserves the current behavior. I just have a question. RecheckCachedPlan is only called by LockCachedPlan and CheckCachedPlan, and both callers are static, why RecheckCachedPlan is exported? Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/ ^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: GetCachedPlan() refactor: move execution lock acquisition out @ 2026-04-16 02:50 Amit Langote <[email protected]> parent: Chao Li <[email protected]> 0 siblings, 0 replies; 6+ messages in thread From: Amit Langote @ 2026-04-16 02:50 UTC (permalink / raw) To: Chao Li <[email protected]>; +Cc: Kirill Reshke <[email protected]>; pgsql-hackers Hi, On Thu, Apr 16, 2026 at 11:35 AM Chao Li <[email protected]> wrote: > > On Apr 15, 2026, at 21:46, Amit Langote <[email protected]> wrote: > > On Wed, Apr 15, 2026 at 11:14 AM Amit Langote <[email protected]> wrote: > >> On Wed, Apr 15, 2026 at 1:08 AM Kirill Reshke <[email protected]> wrote: > >>>> +static bool > >>>> +LockCachedPlan(CachedPlan *cplan) > >>>> +{ > >>>> + AcquireExecutorLocks(cplan->stmt_list, true); > >>>> + if (!cplan->is_valid) > >>>> + { > >>>> + AcquireExecutorLocks(cplan->stmt_list, false); > >>>> + return false; > >>>> + } > >>>> + return true; > >>>> +} > >>> > >>> simply `return cplan->is_valid ` would be more Postgres-y here, isnt it? > >> > >> Agreed, will fix too. > > > > Thinking more about this bit, checking only cplan->is_valid after > > locking is not really equivalent to what CheckCachedPlan() was doing > > before. > > > > CheckCachedPlan() can also reset plan->is_valid based on other state > > (role, xmin), whereas in v1 the new check relied only on lock inval > > callbacks to have set is_valid to false. That means the new check is > > not enough. It may happen to be okay for the current callers in this > > patch, but it is not something I think is safe to rely on once there > > are future callers that do arbitrary work, such as ExecutorPrep(), > > even if carefully coded, between the original GetCachedPlan() / > > CheckCachedPlan() step and the later recheck after locking. > > > > Separately, I also realized that v1 was introducing redundant lock > > acquisition for freshly built plans, which is not really a no behavior > > change refactoring either. > > > > So in v2 I ended up reworking two parts more substantially: > > > > * The patch now factors the reused-generic-plan validity logic into > > RecheckCachedPlan(). > > > > * It also adds an is_reused output argument to GetCachedPlan(), so > > callers can distinguish the reused-plan case from a freshly built plan > > and avoid the extra lock step in the latter case. > > V2 overall looks good, it preserves the current behavior. > > I just have a question. RecheckCachedPlan is only called by LockCachedPlan and CheckCachedPlan, and both callers are static, why RecheckCachedPlan is exported? Oh, I thought someone would ask that. Yes, it could stay static in plancache.c for now and only be exported later, when some future smarter cached-plan locker outside plancache.c actually needs it. -- Thanks, Amit Langote ^ permalink raw reply [nested|flat] 6+ messages in thread
end of thread, other threads:[~2026-04-16 02:50 UTC | newest] Thread overview: 6+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-04-14 08:19 GetCachedPlan() refactor: move execution lock acquisition out Amit Langote <[email protected]> 2026-04-14 16:08 ` Kirill Reshke <[email protected]> 2026-04-15 02:14 ` Amit Langote <[email protected]> 2026-04-15 13:46 ` Amit Langote <[email protected]> 2026-04-16 02:34 ` Chao Li <[email protected]> 2026-04-16 02:50 ` Amit Langote <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox